100 classic ML interview
Top 10 EloNo ratings yet
Play a challenge match to create ratings.
Terms in this set
Click study modes above to mirror Quizlet flow.
1
What is supervised learning?
Learning from labeled data.
2
Which algorithm is typically used for regression?
Linear regression
3
What does overfitting mean?
The model fits noise in training data.
4
Which method is used to prevent overfitting?
Regularization
5
Which metric is most appropriate for regression evaluation?
MSE
6
Which technique is a dimensionality reduction method?
PCA
7
Which ML model assumes independence between features?
Naive Bayes
8
What is the purpose of cross-validation?
Evaluate generalization
9
Which learning paradigm uses trial-and-error interaction?
Reinforcement
10
Which objective does clustering solve?
Grouping similar samples
11
What does bias refer to in ML?
Error from assumptions in model
12
Which best describes gradient descent?
Search for stationary points via gradients
13
What is the cost function for linear regression?
Mean squared error
14
What does regularization do?
Encourages smaller weights
15
What is L1 regularization also called?
Lasso
16
Which algorithm lazily learns at prediction time?
KNN
17
What is an SVM margin?
Distance between hyperplanes and classifier boundary
18
Which kernel allows non-linear separation in SVM?
All of the above
19
What is entropy in decision trees?
Measure of impurity
20
Which technique improves prediction by combining many weak models?
Ensembling
21
What is bagging?
Sampling with replacement and training models
22
Random forests reduce...
Variance
23
Which gradient boosting algorithm is widely used?
XGBoost
24
Which clustering algorithm requires number of clusters k?
K-means
25
What is the elbow method used for?
Choosing k in clustering
26
What is underfitting?
Model performs poorly on training and test data
27
Which sampling technique balances imbalanced datasets?
SMOTE
28
Which metric is best for imbalanced binary classification?
ROC-AUC
29
What does logistic regression output?
Probability of class
30
Softmax is used for...
Multi-class classification
31
Which ML method learns boundaries maximizing margin?
SVM
32
Which algorithm is sensitive to feature scaling?
KNN
33
What is ROC curve plotting?
TPR vs FPR
34
Which technique reduces variance in prediction?
Bagging
35
Which optimization modifies weights per feature ranking?
Feature selection
36
Which learning method creates synthetic minority samples?
SMOTE
37
Which distance measure is used in KNN?
Euclidean distance
38
What is cross entropy used for?
Classification loss
39
What is early stopping?
Stopping training to reduce overfitting
40
What does AUC measure?
Area under ROC curve
41
Which algorithm uses impurity reduction?
Decision tree
42
What is grid search used for?
Hyperparameter tuning
43
Which boosting method reweights samples?
AdaBoost
44
What is the curse of dimensionality?
Distance metrics lose meaning in high dimension
45
Which reduces dimensionality while preserving variance?
PCA
46
What is a confusion matrix used for?
Classification evaluation
47
Which expresses Bayes theorem?
P(A|B)=P(B|A)P(A)/P(B)
48
Which distribution models binary outcomes?
Bernoulli
49
Which technique detects outliers?
Isolation forest
50
What does model variance indicate?
Sensitivity to training sample fluctuations
51
Which sampling is used in bagging?
Sampling with replacement
52
What technique merges multiple weak models sequentially?
Boosting
53
Which activation function outputs probability?
softmax
54
Which regularization shrinks weights to zero?
L1
55
Which type of error is false positive?
Type I
56
Which clustering method identifies density-based clusters?
DBSCAN
57
Which metric measures similarity in clustering?
Silhouette score
58
Which technique reduces overfitting in random forests?
Feature bagging
59
Which sampling is used in Monte Carlo simulation?
Random sampling
60
Which ML task predicts continuous values?
Regression
61
Which approach automates feature search?
AutoML
62
What is variance inflation factor used for?
Detect multicollinearity
63
Which term refers to prior + likelihood combination?
Posterior
64
What is bagging strongest at reducing?
Variance
65
Which technique automatically prunes trees?
Cost-complexity pruning
66
Which learning algorithm uses nearest neighbors for decision?
KNN
67
ROC curve compares classifier performance based on…
TPR and FPR
68
Which is an unsupervised method?
K-means clustering
69
Which reduces high-dimensional noise?
PCA
70
Which model creates axis-aligned splits?
Decision tree
71
What does boosting reduce?
Bias
72
Which method avoids exhaustive search?
Random search
73
Which model family supports interpretability easily?
Decision trees
74
Which metric handles probabilistic outputs?
Log loss
75
Which is a lazy learner?
KNN
76
Which model finds hyperplanes using margin optimization?
SVM
77
Which step is required in SVM to handle non-linearity?
Use kernels
78
Which model learns class conditional density?
Naive Bayes
79
Which approach uses gradient boosting with trees?
XGBoost
80
What is the decision boundary in logistic regression?
Linear hyperplane
81
Which model samples feature subsets per tree?
Random forest
82
Which ML evaluation metric is threshold independent?
ROC-AUC
83
Which algorithm maximizes posterior probability?
Bayesian classifier
84
Which ML concept balances bias and variance?
Regularization
85
Which technique visualizes classification performance?
Both
86
Which optimization adjusts model hyperparameters iteratively?
Bayesian optimization
87
Which improves weak classifiers sequentially using residuals?
Boosting
88
Which describes multicollinearity?
Features strongly correlated
89
Which method measures class imbalance impact?
All of the above
90
Which prevents exploding gradients?
Weight clipping
91
Which approach learns probabilistic boundaries?
Logistic regression
92
Which model learns hierarchical splits?
Decision trees
93
Which step makes ML models sensitive to data distribution?
Normalization
94
Which reduces dimensionality via eigen decomposition?
PCA
95
Which ML model uses maximum likelihood estimation?
Logistic regression
96
Which improves model generalization?
All of the above
97
Which method prevents data leakage?
Scaling after splitting
98
Which metric balances precision and recall?
F1 score
99
What does cross-entropy measure?
Distance between probability distributions
100
Which technique samples train data with replacement?
Bagging