K-nearest neighbors (classification)
The course’s running nonparametric classifier and the prof’s go-to demo for the bias-variance-tradeoff. Take a new point , find its nearest training points by Euclidean distance, take a majority vote (or class-proportion estimate) over their labels. is the flexibility knob: wiggly islands (overfit, high variance), large over-smoothed (underfit, high bias). Killed by the curse-of-dimensionality; the prof’s “statisticians don’t really love KNN” classifier.
Definition (prof’s framing)
Introduced in module 2 as the canonical nonparametric example, then formalized in module 4:
“The general idea is that there’s no parameters … a non-parametric model tries to have a very loose form” - L03-statlearn-2
“To classify a new point, look at its K nearest training points (Euclidean distance) and take a majority vote. The class probability is just the fraction of the K neighbours of each class.” - L08-classif-2
Formally, for a test point and a positive integer , let be the index set of the training points closest to in Euclidean distance. Then:
Predict the class with the largest .
Notation & setup
- Training data with , (note: in KNN is number of neighbors, not number of classes, flagged in L07-classif-1).
- Distance: Euclidean by default, . Other metrics exist; not on this exam.
- Use odd for binary classification to avoid ties - L03-statlearn-2.
- Probability output: the fraction of neighbors of each class. This is the score you’d threshold for ROC / AUC (Exercise 4.6j).
Formula(s) to know cold
Hand-computation skill the exercises drill: for a small training set, compute Euclidean distances to every training point, sort, take the top , vote. Exercise 4.1 is the canonical pen-and-paper drill.
Insights & mental models
is exactly a flexibility knob (L03-statlearn-2 and L07-classif-1):
- : each training point is its own little decision island. Boundary jumps at every point. High variance, low bias. “Little islands of red, which probably doesn’t make any sense.”
- large (e.g. K = 150 out of 200): boundary becomes nearly straight. High bias, low variance. In the limit : KNN returns the majority class everywhere.
- Optimal is intermediate, found by cross-validation (Exercise 4.6h: sweep and pick the elbow / minimum on a CV / test-error curve).
The boundary-shape interpretation:
“If the Bayes decision boundary is highly non-linear, would we expect the best to be large or small?”, small., Exercise 4.1c
Small tracks a wiggly Bayes boundary; large smooths it away.
KNN is the diagnostic-paradigm classifier: it estimates directly, by an empirical count over neighbors. Same paradigm as logistic regression. Contrast with the sampling / generative paradigm (diagnostic-vs-sampling-paradigm) of LDA / QDA / Naive Bayes, which model class-conditional densities and flip via Bayes’ rule. (L07-classif-1, L09-classif-3.)
Decision regions can be wildly complicated, even with a trivial model:
“KNN can produce extremely complicated decision regions, including disconnected ‘islands’, even though the model itself is trivial.” - L08-classif-2
That’s the price of nonparametric flexibility, and the reason it sometimes beats LDA / QDA on weird boundaries. It’s also why interpretability is essentially zero, there’s no per-feature coefficient to read off.
The killer pitfall, curse-of-dimensionality (L08-classif-2):
“The dimensionality is a curse, and in this case, it’s a curse for K-nearest neighbours … makes it useless.”
In high , all pairwise distances become nearly equal, so “nearest” stops meaning anything. Simulate it: sample uniform points in dimensions, plot the distribution of pairwise distances as grows, the spread collapses. Fix is dimensionality reduction (e.g. PCA), but it’s “a bit hacky still.”
Why statisticians don’t love KNN (L08-classif-2): it has no interpretable parameters, no closed-form theory, no per-feature inference, no extrapolation. It’s a “lookup-table” classifier. Useful as a baseline and as a teaching device for bias-variance, but rarely the model anyone would deploy.
Practical recipe for picking K
The exercise pattern (Exercise 4.6h):
- For , fit KNN on training data.
- Predict on validation / test set.
- Compute misclassification rate at each .
- Plot error vs , pick the at the minimum (or one-SE rule, see one-standard-error-rule).
In the 2026 exam form, you wouldn’t write the R code; you’d interpret the resulting curve.
Standardization
Because KNN uses Euclidean distance, predictors on different scales bias the metric: a feature in millimeters dominates a feature in years just because of unit. Standardize predictors first (z-score) before fitting KNN. The prof’s standardization atom flags KNN as a method that is not scale-invariant; this is mandatory preprocessing in practice. (Not heavily emphasized in M2, but it’s a real exam-trap-shaped fact.)
Pitfalls
- Tied votes: use odd for binary; for multi-class, the standard library breaks ties at random (warned in Exercise 4.6g R-hint).
- Curse of dimensionality: KNN degrades fast as grows; not the right tool when is big.
- Small K = high flexibility (the inverted-knob trap): large → smoother → less flexible → more bias. Easy T/F mistake.
- No standardization: distances dominated by large-scale features; results garbage. Standardize first.
- No extrapolation: at a far from any training point, KNN’s “nearest neighbors” are still its nearest neighbors, but they’re far. Predictions are essentially the majority class of distant points; not meaningful.
- K = 7 with 7 training points and (Exercise 4.1b): you’re using every training point, the prediction is just the global majority class regardless of . Pure underfit. The exam-trap framing of “why is K=7 bad?” in that exercise.
Scope vs ISLP
- In scope: the algorithm, as flexibility knob, hand-computation of distances and votes, contrast with the parametric / generative paradigms, curse of dimensionality, picking by CV.
- Look up in ISLP: §2.2.3 (Bayes / KNN setup, Figs 2.14–2.17 for the geometry and the U-shape in ); §4.7.6 (KNN as classifier alongside logistic / LDA / QDA in the comparison setting).
- Skip in ISLP: weighted KNN (out of scope: name-checked only), distance metrics other than Euclidean (mentioned in M10 distance-metrics but not exam-tested for KNN).
Exercise instances
- Exercise 4.1a: compute Euclidean distances from a 7-point training set to test point by hand. Pure pen-and-paper drill.
- Exercise 4.1b: predict class for from those distances. Explain why is bad (uses all training points → just the majority class everywhere).
- Exercise 4.1c: relate optimal to whether the Bayes decision boundary is highly nonlinear (small for wiggly boundary).
- Exercise 4.6g: fit KNN with on Weekly stock data via
knn()from theclasslibrary; produce confusion matrix on test set. - Exercise 4.6h: sweep , plot test misclassification, pick best , report confusion matrix at that .
- Exercise 4.6j: extract KNN probabilities (via
attributes(model)$prob) and use them in ROC / AUC alongside glm, LDA, QDA. The prob extraction is library-specific R hassle, not exam material, but the idea (KNN gives a probability estimate, not just a class label) is.
How it might appear on the exam
- Hand calculation (the 2026 procedural format): given a small training set and a test point, compute distances, identify the nearest, vote. The 2024 / 2025 exams have variants of Exercise 4.1.
- Direction-of-effect T/F. “Increasing increases the flexibility of KNN” → false. “Small has low bias” → true. “Large has low variance” → true.
- Boundary-and-K matching. Given two KNN decision-boundary plots (one wiggly, one smooth), match them to “K = 1” and “K = 50.”
- Method comparison. “When would you prefer KNN to LDA?” → highly nonlinear Bayes boundary, low , lots of data, no need for interpretation. (Or vice versa, prefer LDA when assumptions hold and you want coefficients to interpret.)
- Concept recall. “What is the curse of dimensionality and how does it affect KNN?”, pairwise distances become nearly equal in high , neighbors stop being meaningfully near, KNN’s mechanism breaks.
Related
- knn-regression: the same algorithm with instead of a vote
- parametric-vs-nonparametric: the methodological frame KNN sits inside
- flexibility-overfitting-underfitting: is the flexibility knob; KNN provides the U-shape demo for classification
- bias-variance-tradeoff: small → high variance / low bias; large the reverse
- classification-setup: KNN is one estimator of ; the Bayes classifier is the gold standard it tries to approximate
- curse-of-dimensionality: KNN’s headline failure mode in high
- diagnostic-vs-sampling-paradigm: KNN sits in the diagnostic camp alongside logistic regression
- logistic-regression, linear-discriminant-analysis, quadratic-discriminant-analysis: the classifiers KNN is benchmarked against in M4
- cross-validation: how you’d pick in practice
- standardization: required preprocessing for KNN