R² and adjusted R²

The classical “fraction of variance explained” measure. Prof distrusts it, never decreases when you add a predictor (so it always favors the bigger model on training data) and “what is good is very different depending on the context.” Adjusted patches the parameter-count temptation; the prof would still prefer test-set error.

Definition (prof’s framing)

where (total variability in ) and (leftover after the fit).

“How much does your shit vary in general versus how much can you actually explain of that.” - L05-linreg-1

Prof’s framing: is the first and crudest of many model-accuracy measures. “I would always use the test error”, foreshadowing modules 5–6.

Notation & setup

  • = total sum of squares.
  • = residual sum of squares.
  • (in OLS with intercept). Higher = more variance explained on training data.
  • In simple linear regression, , squared sample correlation.

Formula(s) to know cold

Adjusted :

Note: adjusted can decrease when you add a useless variable; plain cannot.

Insights & mental models

Why alone fails

Adding a predictor on training data cannot make the fit worse. The new model contains the old as a special case (set the new to zero); the optimizer can always do at least as well. So rises monotonically with :

“It seems easy, you’re just like, ‘just tell me the error.’ Yes, but… should you look at the data that you fit the model on, should you look at held-out data, should you penalize it in some way?” - L06-linreg-2

Worked example from the slides: BMI alone gives ; add age → 0.58; add age + neck + hip + abdomen → 0.72. Is the bigger model “better”? Without held-out evaluation, you can’t tell.

Why the prof distrusts

“What is good is very different depending on the context of the data. … In my field, that’s like unheard of. That’s fine.” - L06-linreg-2

Plus:

“I have reported this [adjusted ] in articles, but I would never really, if it was up to me, we wouldn’t include it. Especially in the analysis I do, typically there’s so much stuff that we’ve left out that this number doesn’t mean anything. But it’s still interesting.” - L06-linreg-2

His preferred metric: test-set error, module 5’s cross-validation is what he reaches for instead.

Adjusted , the patch

The factor penalizes parameter count. With this:

“Adding more parameters can actually give you a smaller if they don’t do anything.” - L06-linreg-2

So adjusted behaves more like a model-comparison metric, but still uses training data, still distrusted.

in simple LR

Equals the squared sample correlation:

Verifiable in R: summary(lm(...))$r.squared should equal cor(x, y)^2.

Exam signals

“I would always use the test error.” - L05-linreg-1

“Kind of just the first one that they came up with. A lot of people don’t like it. There’s an adjusted version. There’s other versions.” - L05-linreg-1

“The bigger point: any model-comparison metric must account for the parameter-count temptation.”, paraphrase of L06-linreg-2 adjusted slide.

The prof in L27-summary explicitly links the “test vs train” pattern to analogues for multiple model types. The keyword “training” vs “testing” in an exam question changes the answer.

Pitfalls

  • as model quality. A high doesn’t mean the model is correct, useful, or generalizes. It just means it fits the training data.
  • comparisons across different sample sizes. TSS scales with , so direct comparisons across datasets of different size are meaningless.
  • on a model without intercept. The decomposition only holds when the model has an intercept. Without one, can be negative or > 1.
  • outside OLS. For ridge / lasso / GAM / GLM, you can compute analogous quantities, but they don’t carry the same meaning. Use the test-set error instead.
  • Adjusted as a panacea. Better than but still a training metric and still distrusted by the prof. CV is the principled answer.
  • Interpretation slip. "" means 70% of the variance in is explained, not “the model is 70% accurate.”

Scope vs ISLP

  • In scope: the formula, what it means, why it monotonically increases with , the adjusted form and its penalty, the prof’s distrust, that test error is preferred.
  • Look up in ISLP: §3.1.3 (pp. 70–71, in simple regression); §3.2.2 (pp. 79–81, adjusted and the four important questions).
  • Skip in ISLP: the derivation of adjusted from Mallow’s , module 6 covers this conceptually only; full algebra is out of scope per L12-modelsel-1 / L13-modelsel-2.

Exercise instances

None directly tagged in the manifest, shows up as a side-output in essentially every regression-fitting exercise (e.g. Exercise3.1c interprets summary(lm), where is one of the lines).

How it might appear on the exam

  • Compute from a small table. Given and (or and the fitted values), compute .
  • True/false on monotonicity. “Adding a predictor can never decrease on training data”, TRUE for plain , FALSE for adjusted.
  • Train vs test trap. “If model B has more predictors than model A, then ”, true on training, not necessarily on test (could overfit). The 2025 Q4 polynomial trap is the template, see L27-summary.
  • Compare models by vs adjusted . Identify when they disagree; explain why.
  • What’s good ? Open-ended, “it depends on the field.” Engineering wants 0.95+; biology might be thrilled with 0.30.