R² and adjusted R²
The classical “fraction of variance explained” measure. Prof distrusts it, never decreases when you add a predictor (so it always favors the bigger model on training data) and “what is good is very different depending on the context.” Adjusted patches the parameter-count temptation; the prof would still prefer test-set error.
Definition (prof’s framing)
where (total variability in ) and (leftover after the fit).
“How much does your shit vary in general versus how much can you actually explain of that.” - L05-linreg-1
Prof’s framing: is the first and crudest of many model-accuracy measures. “I would always use the test error”, foreshadowing modules 5–6.
Notation & setup
- = total sum of squares.
- = residual sum of squares.
- (in OLS with intercept). Higher = more variance explained on training data.
- In simple linear regression, , squared sample correlation.
Formula(s) to know cold
Adjusted :
Note: adjusted can decrease when you add a useless variable; plain cannot.
Insights & mental models
Why alone fails
Adding a predictor on training data cannot make the fit worse. The new model contains the old as a special case (set the new to zero); the optimizer can always do at least as well. So rises monotonically with :
“It seems easy, you’re just like, ‘just tell me the error.’ Yes, but… should you look at the data that you fit the model on, should you look at held-out data, should you penalize it in some way?” - L06-linreg-2
Worked example from the slides: BMI alone gives ; add age → 0.58; add age + neck + hip + abdomen → 0.72. Is the bigger model “better”? Without held-out evaluation, you can’t tell.
Why the prof distrusts
“What is good is very different depending on the context of the data. … In my field, that’s like unheard of. That’s fine.” - L06-linreg-2
Plus:
“I have reported this [adjusted ] in articles, but I would never really, if it was up to me, we wouldn’t include it. Especially in the analysis I do, typically there’s so much stuff that we’ve left out that this number doesn’t mean anything. But it’s still interesting.” - L06-linreg-2
His preferred metric: test-set error, module 5’s cross-validation is what he reaches for instead.
Adjusted , the patch
The factor penalizes parameter count. With this:
“Adding more parameters can actually give you a smaller if they don’t do anything.” - L06-linreg-2
So adjusted behaves more like a model-comparison metric, but still uses training data, still distrusted.
in simple LR
Equals the squared sample correlation:
Verifiable in R: summary(lm(...))$r.squared should equal cor(x, y)^2.
Exam signals
“I would always use the test error.” - L05-linreg-1
“Kind of just the first one that they came up with. A lot of people don’t like it. There’s an adjusted version. There’s other versions.” - L05-linreg-1
“The bigger point: any model-comparison metric must account for the parameter-count temptation.”, paraphrase of L06-linreg-2 adjusted slide.
The prof in L27-summary explicitly links the “test vs train” pattern to analogues for multiple model types. The keyword “training” vs “testing” in an exam question changes the answer.
Pitfalls
- as model quality. A high doesn’t mean the model is correct, useful, or generalizes. It just means it fits the training data.
- comparisons across different sample sizes. TSS scales with , so direct comparisons across datasets of different size are meaningless.
- on a model without intercept. The decomposition only holds when the model has an intercept. Without one, can be negative or > 1.
- outside OLS. For ridge / lasso / GAM / GLM, you can compute analogous quantities, but they don’t carry the same meaning. Use the test-set error instead.
- Adjusted as a panacea. Better than but still a training metric and still distrusted by the prof. CV is the principled answer.
- Interpretation slip. "" means 70% of the variance in is explained, not “the model is 70% accurate.”
Scope vs ISLP
- In scope: the formula, what it means, why it monotonically increases with , the adjusted form and its penalty, the prof’s distrust, that test error is preferred.
- Look up in ISLP: §3.1.3 (pp. 70–71, in simple regression); §3.2.2 (pp. 79–81, adjusted and the four important questions).
- Skip in ISLP: the derivation of adjusted from Mallow’s , module 6 covers this conceptually only; full algebra is out of scope per L12-modelsel-1 / L13-modelsel-2.
Exercise instances
None directly tagged in the manifest, shows up as a side-output in essentially every regression-fitting exercise (e.g. Exercise3.1c interprets summary(lm), where is one of the lines).
How it might appear on the exam
- Compute from a small table. Given and (or and the fitted values), compute .
- True/false on monotonicity. “Adding a predictor can never decrease on training data”, TRUE for plain , FALSE for adjusted.
- Train vs test trap. “If model B has more predictors than model A, then ”, true on training, not necessarily on test (could overfit). The 2025 Q4 polynomial trap is the template, see L27-summary.
- Compare models by vs adjusted . Identify when they disagree; explain why.
- What’s good ? Open-ended, “it depends on the field.” Engineering wants 0.95+; biology might be thrilled with 0.30.
Related
- linear-regression: the model whose fit is being assessed
- sampling-distribution-of-beta: adjusted uses the same df accounting
- t-test-and-significance: alternative per-coefficient measure
- cross-validation: the prof’s preferred replacement for
- bias-variance-tradeoff: why training is misleading on flexible models