Collinearity / multicollinearity
When two or more predictors are correlated, becomes near-singular, blows up, and the variance of explodes. Coefficients trade off against each other; SEs go up; significance disappears even when the joint relationship is strong. The cleanest fix is to drop a variable, or use ridge-regression / PCR to escape the singularity.
Definition (prof’s framing)
“Some of the predictors are themselves correlated. … We could trade between and , e.g., make bigger and smaller, while fit is similar.” - L08-classif-2
Perfect collinearity → infinitely many least-squares solutions ( exactly singular). Mild collinearity → highly sensitive solution that swings around with tiny data perturbations.
“Going to go wah, wah.” - L08-classif-2
“Predictions blow up out of sample because you end up with ‘a million minus a million.’” , paraphrase from L08-classif-2
Notation & setup
- Collinearity is a property of , independent of .
- Source: . As columns of become correlated, eigenvalues of approach zero; corresponding eigenvalues of the inverse .
- The variance inflation factor (VIF) measures the explosion per coefficient , flagged as self-study by the prof, not on the exam.
Insights & mental models
Why this is the inevitable problem
The prof keeps coming back to it because it’s where the algebra of OLS first cracks:
“I think this is really why statisticians love these distributions, because you can read out what’s going to happen when you look at them. You can be like , ah, that X transpose X is going to screw us later.” - L06-linreg-2
He flagged it in L06-linreg-2 before ever getting to a worked example, then returned to it in L08-classif-2 (one slide), then again in L14-modelsel-3 / L15-modelsel-4 when introducing PCR , collinearity is the reason PCR exists.
Symptoms in the regression output
- Estimated coefficients have huge SEs even when the joint contribution of the variables is highly significant (large F, individually insignificant t’s).
- Coefficient signs and magnitudes are unstable across resamples , refit on a different subset of data and they swing wildly.
- Adding or removing one variable dramatically changes the others’ coefficients.
- Predictions on new data are wildly off because the trade-off “a million minus a million” evaporates with any small shift.
Connection to the t-test
This is why the F-test exists:
“The variables can actually be correlated, and then none of them actually look significant, but overall the test is very significant.” - L06-linreg-2
So skipping Q1 (F-test on all coefficients) and going straight to per-coefficient t-tests can hide a real signal that’s spread across correlated predictors. See f-test and t-test-and-significance.
Pathological in
When , is always singular regardless of correlation , collinearity becomes total. This is the regime that motivates module 6 (regularization / dimensionality reduction). See high-dimensional-regression.
Fixes
The prof’s menu:
- Drop a variable. Cheapest, often the right answer.
- Combine the collinear ones (e.g., average them, take a difference). Domain-knowledge driven.
- PCA / PCR. The “compress your variables into fewer variables with some loss” route , replaces the correlated columns with orthogonal principal components, killing the collinearity directly.
“PCA is a way of compressing your variables into fewer variables with some loss. … in this case and would turn into… one would be this trend, basically, and then the other one would be the one that’s moving around , the shit around it. … because then they all become orthogonal.” - L08-classif-2
- ridge-regression (L2). Adds to before inverting → guaranteed invertible, finite-variance coefficients. The standard fix.
- LDA as dimensionality reduction. Brought up in passing as the second answer to the collinearity question , “another route” , see L08-classif-2.
Exam signals
“This factor X transpose X comes into play in particular when two variables are basically the same , because then they can trade off each other and then this variance explodes. That’s a thing we discuss at the very end of today. It’s called collinearity.” - L06-linreg-2
“Only checking individual p-values is dangerous. … The variables can actually be correlated, and then none of them actually look significant, but overall the test is very significant.” - L06-linreg-2
“The collinearity problem we talked about a minute ago, that can happen here [logistic regression], and then this thing’s no longer having a single maximum and it gets weird.” - L08-classif-2
The 2025 Q4 polynomial trap: model B with (collinear quadratic terms) , “if you try to fit this model, your optimizer goes, ‘hey, no, this sucks.’ Adding L2 makes .” - L27-summary
Pitfalls
- High individual p-values, low joint p-value. Classic collinearity signature. Always run the F-test before drilling into individual t-tests.
- Stable predictions, unstable coefficients. A collinear regression often predicts well , because the sum is well-determined even when each coefficient isn’t. So if you only care about prediction, you may not feel the symptoms. If you care about interpretation of which predictor matters, you’re in trouble.
- K dummies for K-level factor. Perfect collinearity. Identifiability fails , singular. Always use dummies plus a reference category (see categorical-encoding-and-interactions).
- High correlation ≠ collinearity. Two predictors can be highly correlated without breaking irreparably. The threshold for “trouble” depends on and ; rule of thumb VIF > 5 or 10 (but VIF is self-study per the prof).
- Standardize before diagnosing. Numerical near-singularity can come from scale differences across columns. Standardize first when investigating.
Scope vs ISLP
- In scope: the qualitative story , what collinearity is, how it shows up in the SE, why it makes coefficients unstable, the connection to inverse, and the menu of fixes (drop, combine, PCR, ridge).
- Look up in ISLP: §3.3.3 (pp. 99–102, Collinearity) , Credit-card example with
limitandrating, the VIF formula. The book’s level of detail matches the prof’s coverage. - Skip in ISLP (book-only / prof excluded): the VIF formula and computation , L08-classif-2 explicit “read it as self-study”; not exam material. Condition number, eigen-decomposition diagnostics , L04-statlearn-3 deferred. Bayesian interpretation of ridge-as-prior , L14-modelsel-3 explicit “I really don’t think I’d put this on the test.”
Exercise instances
None directly tagged for collinearity in module 3. The concept reappears in module 6 , Exercise 6.5 (ridge on Credit) and Exercise 6.6 (lasso on Credit) , but those are owned by the ridge-regression / lasso atoms.
How it might appear on the exam
- Identify symptoms in regression output. Given a table with two correlated predictors (e.g.,
limitandratingin Credit), spot the high SEs and low individual significance, then explain why they happen. - F-test vs t-test under collinearity. Explain why the F-test can be highly significant while individual t’s are not.
- What’s the fix? Multiple-choice or short-answer: drop a variable / use ridge / use PCR. State the principle (orthogonalize, regularize, or eliminate).
- The 2025 Q4 polynomial trap. Two collinear terms; what does L2 do? (Forces ; the optimizer otherwise has infinite valley.)
- . Conceptual: why does OLS break? Because is necessarily singular. Need regularization to make the problem well-posed.
Related
- linear-regression: the model where collinearity bites
- design-matrix-and-hat-matrix: is what blows up
- sampling-distribution-of-beta: variance covariance inflated by collinearity
- t-test-and-significance: individual t’s lose power
- ridge-regression: the canonical regularization fix
- principal-component-regression: orthogonalize the predictors directly
- high-dimensional-regression: the regime where collinearity is total