← Back to wiki

Module 03 — Linear Regression

29 questions · 100 points · ~45 min

Click an option to lock the answer; the explanation auto-opens. Score tracker bottom-left.

Question 1 3 points

A model is fit as $y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \varepsilon_i$ by ordinary least squares. Which statement best describes this model?

Show answer
Correct answer: D

"Linear" refers to the parameters, not the predictor. Treat $(1, x, x^2)$ as columns of $\mathbf{X}$ and run ordinary OLS — closed-form $\hat{\boldsymbol\beta} = (\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top\mathbf{y}$ still applies.

A confuses curvature in $x$ with non-linearity in $\boldsymbol\beta$ (the canonical L06 trap). B is wrong because GAMs use *smoothed* univariate functions, not fixed polynomial bases. C is wrong: OLS has a one-shot closed form whenever $\mathbf{X}^\top\mathbf{X}$ is invertible.

Atoms: polynomial-regression, linear-regression. Lecture: L06-linreg-2.

Question 2 5 points ISLP §3 Q3

Starting salary (in $\$1{,}000$) is modelled as $$\hat y = 50 + 20\,\text{GPA} + 0.07\,\text{IQ} + 35\,\text{Level} + 0.01\,(\text{GPA}\cdot\text{IQ}) - 10\,(\text{GPA}\cdot\text{Level})$$ with $\text{Level} = 1$ for college and $0$ for high school. Which statement is correct, holding $\text{IQ}$ and $\text{GPA}$ fixed?

Show answer
Correct answer: D

The college-vs-high-school gap is $35 - 10\cdot\text{GPA}$. It is positive only when $\text{GPA} < 3.5$ and flips negative for $\text{GPA} > 3.5$. So at sufficiently high GPA, high-school graduates earn more.

A reads only the main effect $35$ and ignores the interaction (the canonical "main effect under interaction" trap). B ignores the +35 main effect entirely. C flips the sign of the interaction-driven crossover.

Atoms: categorical-encoding-and-interactions, linear-regression. Lecture: L27-summary.

Question 3 4 points Exam 2025 P2a

Adult weight (kg) is modelled on age (years) and sex (male = 1, female = 0) with an $\text{age}\times\text{sex}$ interaction. The fitted equation is $$\hat{\text{weight}} = 60.2 + 0.11\,\text{age} + 5.03\,\text{sex} + 0.68\,(\text{age}\cdot\text{sex}).$$ Mark each statement as true or false.

Show answer
  1. False — for females ($\text{sex}=0$) the age slope is $0.11$, not $0.79$. The value $0.79 = 0.11 + 0.68$ is the *male* age slope; this is the canonical "read the slope through the interaction" trap.
  2. False — the $5.03$ coefficient is the male-vs-female gap *only when age $= 0$*. Since the interaction is $0.68$, the gap at age $a$ is $5.03 + 0.68\,a$, very different from $5.03$ across the data range. This is the prof's flagged 2025 Q2 interaction trap.
  3. True — male age slope = $0.11 + 0.68 = 0.79$ kg/year.
  4. True — the $\text{age}\cdot\text{sex}$ term is precisely what allows different age slopes per sex.

Atoms: categorical-encoding-and-interactions. Lecture: L27-summary.

Question 4 4 points ISLP §3 Q4

With $n=100$ observations of a single predictor $X$ and quantitative response $Y$, you fit Model A: $\hat y = \hat\beta_0 + \hat\beta_1 x$, and Model B: $\hat y = \hat\beta_0 + \hat\beta_1 x + \hat\beta_2 x^2 + \hat\beta_3 x^3$. Suppose the *true* relationship is linear. What can you say about the **training** RSS of the two models?

Show answer
Correct answer: B

Model A is nested inside Model B (set $\beta_2 = \beta_3 = 0$). On the training data, OLS picks the coefficients that *minimise* RSS over the larger parameter space, so Model B's training RSS $\le$ Model A's. The prof's flagged keyword is training — for test RSS it would flip.

A misses that adding parameters lets the optimiser fit noise — the truth being linear matters for population-level bias, not for finite-sample training fit. C confuses irreducible noise with what the model *fits to*; cubic terms don't add noise, they absorb residuals. D conflates training and test — training RSS is computed only on the training split and is fully determined by the fit.

Atoms: polynomial-regression, r-squared, bias-variance-tradeoff. Lecture: L27-summary.

Question 5 3 points ISLP §3 Q4

Same setup as Q4 (true relationship is linear, $n=100$). What do you expect for the **test** RSS of Model A versus Model B?

Show answer
Correct answer: A

The cubic model has the same bias as the linear model (both contain the truth) but extra variance from estimating two unnecessary coefficients, so on average test RSS is lower for the simpler Model A.

B inverts the bias-variance message — when the truth is in the simpler model, the simpler model wins. C ignores the variance term — Model B's noisier coefficient estimates inflate prediction MSE. D is wrong: with finite samples $\hat\beta_2, \hat\beta_3$ are nonzero point estimates, so Model B's predictions differ from Model A's.

Atoms: polynomial-regression, bias-variance-tradeoff.

Question 6 5 points Exam 2025 P4a

A Boston-housing-style regression of median home value (medv, in $\$1000$) on nine predictors plus an $\text{rm}^2$ quadratic term gives the following partial output for the $\text{chas}$ indicator (1 if tract bounds the Charles River, 0 otherwise):

chas  Estimate 3.36  Std. Error 0.86  t value 3.91  Pr(>|t|) 0.0001

Using the working approximation $t_{0.975, n-p-1}\approx 2$, the approximate 95% confidence interval for the chas effect is closest to:

Show answer
Correct answer: B

$\hat\beta \pm t_{0.975}\cdot\mathrm{SE} \approx 3.36 \pm 2\cdot 0.86 = 3.36 \pm 1.72 \to [1.64, 5.08]$. Houses bounding the river fetch on average $\$1{,}640$–$\$5{,}080$ more, holding other predictors constant.

A drops the $t$ multiplier, using one SE rather than two. C uses the t-value $3.91$ instead of the t-quantile (a confusion of column names). D treats the p-value as a confidence bound — p-values and CIs are different objects.

Atoms: confidence-and-prediction-intervals, sampling-distribution-of-beta, t-test-and-significance. Lecture: L27-summary.

Question 7 3 points Exam 2025 P4a

A linear model is fit with an intercept, 7 continuous predictors, a 9-level factor (entered as dummies with one reference level), and one quadratic term I(rm^2). How many regression parameters does the fitted model consume (counting the intercept)?

Show answer
Correct answer: B

1 intercept + 7 continuous slopes + $K-1 = 8$ dummies for the 9-level factor + 1 for $\text{rm}^2$ = 17.

A forgets the intercept entirely. C uses $K=9$ dummies instead of $K-1$ — the 9-th level is absorbed into the intercept (otherwise $\mathbf{X}^\top\mathbf{X}$ is singular). D double-counts the quadratic term as if both $\text{rm}$ and $\text{rm}^2$ were *new* (they each consume one slot, and $\text{rm}$ is already in the 7 continuous predictors).

Atoms: categorical-encoding-and-interactions, design-matrix-and-hat-matrix.

Question 8 4 points

A multiple regression is fit with $n = 200$ observations and $p = 9$ slope parameters. For one predictor, the output gives $\hat\beta = 0.40$ and $\mathrm{SE}(\hat\beta) = 0.20$. Using the working cutoff $t_{0.975, 190}\approx 1.97$, is this coefficient significant at the 5% level, and what is its t-statistic?

Show answer
Correct answer: A

$t = \hat\beta / \mathrm{SE}(\hat\beta) = 0.40 / 0.20 = 2.0 > 1.97$, so reject $H_0:\beta=0$ at $\alpha=0.05$.

B inverts numerator and denominator (computes $\mathrm{SE}/\hat\beta$). C drops the SE step and reads $\hat\beta\cdot\mathrm{SE}$. D applies the wrong cutoff (1% two-sided instead of 5%).

Atoms: t-test-and-significance, sampling-distribution-of-beta.

Question 9 4 points CE1 P2g

For a t-test with null $H_0:\beta_j = 0$ and observed two-sided p-value $p$, mark each statement as true or false.

Show answer
  1. False — the p-value is computed *conditional on* $H_0$; it is not a posterior probability of $H_0$.
  2. False — failing to reject $H_0$ is "insufficient evidence", not "$H_1$ is false".
  3. True — this is the canonical correct definition.
  4. False — under $H_0$ "everything is random chance" by assumption; the p-value is a tail probability under that assumption, not a probability of "results being chance".

Atoms: t-test-and-significance.

Question 10 3 points

A regression on $n = 50{,}000$ observations gives $\hat\beta = 0.003$ kg/year for a body-fat predictor, with $p < 10^{-6}$. The investigator concludes that the effect is large because the p-value is tiny. The best critique is:

Show answer
Correct answer: D

The prof's "significance is just sample size" sermon: with huge $n$ the SE shrinks like $1/\sqrt n$, so any non-zero effect becomes statistically significant. Practical (effect-size) and statistical (p-value) significance are different — you want both, ideally.

A confuses significance with model adequacy — diagnostics, not p-values, check specification. B is the canonical conflation the prof warns against. C invents a non-existent reporting rule.

Atoms: t-test-and-significance, sampling-distribution-of-beta. Lecture: L05-linreg-1.

Question 11 3 points

Recall the simple-LR formula $\mathrm{SE}(\hat\beta_1)^2 = \sigma^2 / \sum_i (x_i - \bar x)^2$. Mark each statement about $\mathrm{SE}(\hat\beta_1)$ as true or false.

Show answer
  1. False — variance scales like $1/n$, so SE scales like $1/\sqrt n$. Doubling $n$ divides SE by $\sqrt 2$, not by $2$ — this is the canonical "$\sqrt n$ rate" trap.
  2. True — the denominator $\sum(x_i - \bar x)^2$ grows, shrinking SE; this is the prof's "design wider experiments" lever.
  3. False — collinearity *inflates* SE: $\mathbf{X}^\top\mathbf{X}$ becomes near-singular, $(\mathbf{X}^\top\mathbf{X})^{-1}$ blows up.
  4. True — SE is proportional to $\sigma$.

Atoms: sampling-distribution-of-beta, collinearity. Lecture: L05-linreg-1.

Question 12 3 points

At a fixed test point $\mathbf{x}_0$, you can construct (a) a 95% confidence interval for $\mathbf{x}_0^\top\boldsymbol\beta$ and (b) a 95% prediction interval for a future $Y$ at $\mathbf{x}_0$. Which of the following correctly contrasts them?

Show answer
Correct answer: A

The CI for the *mean response* uses $\hat\sigma\sqrt{\mathbf{x}_0^\top(\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{x}_0}$; the PI for a *future observation* uses $\hat\sigma\sqrt{1 + \mathbf{x}_0^\top(\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{x}_0}$ — that "+1" is the $\sigma^2$ contribution from $\varepsilon_{\text{new}}$.

B swaps the targets: the CI is for the mean, the PI is for an individual. C ignores the structurally different objects each interval covers. D inverts the inequality — PI is *always* wider, never narrower.

Atoms: confidence-and-prediction-intervals, sampling-distribution-of-beta.

Question 13 4 points

Mark each statement about $R^2$ and $R^2_{\text{adj}}$ in OLS as true or false.

Show answer
  1. True — the larger model contains the smaller as a special case ($\beta_{\text{new}} = 0$), so OLS minimisation cannot raise RSS.
  2. False — that's the property of ordinary $R^2$. Adjusted $R^2$ has the $(n-1)/(n-p-1)$ penalty, so it *can* fall when added parameters do not pull their weight; that's exactly what makes it a model-selection signal rather than just a goodness-of-fit number.
  3. True — a standard simple-LR identity.
  4. False — $R^2$ is the *fraction of variance* in $Y$ explained, not a classification accuracy.

Atoms: r-squared, linear-regression.

Question 14 3 points

A linear model is fit to $n = 8$ observations with response variance summary $\mathrm{TSS} = 800$. The fitted residuals satisfy $\mathrm{RSS} = 200$. What is $R^2$?

Show answer
Correct answer: C

$R^2 = 1 - \mathrm{RSS}/\mathrm{TSS} = 1 - 200/800 = 0.75$.

A reports $\mathrm{RSS}/\mathrm{TSS} = 0.25$ as $0.20$ — arithmetic slip. B reports $\mathrm{RSS}/\mathrm{TSS}$ instead of $1 - \mathrm{RSS}/\mathrm{TSS}$ (forgets the $1-$). D inverts the ratio to $\mathrm{TSS}/\mathrm{RSS}$.

Atoms: r-squared.

Question 15 3 points

After fitting an OLS regression you produce a QQ plot of standardised residuals against theoretical normal quantiles. The middle of the plot lies on the reference line, but both tails curl strongly upward at the high end and downward at the low end (heavy "S-shape"). Which assumption is most directly called into question?

Show answer
Correct answer: B

An S-curve with extreme tails pulling away from the reference line is the canonical "heavy tails / non-Gaussian errors" pattern on a QQ plot. The bulk being aligned says the centre of the distribution looks roughly normal; the deviations are at the extremes.

A would show as structure in the residuals-vs-fitted or order plots, not in the QQ plot. C shows up as curvature in residuals-vs-fitted. D shows up as a fan in residuals-vs-fitted or scale-location plots, not as an S in the QQ plot.

Atoms: residual-diagnostics, gaussian-error-assumptions. Lecture: L06-linreg-2.

Question 16 4 points

In simple linear regression with $n = 5$ predictor values $\mathbf{x} = (1, 2, 3, 4, 10)$, what is the leverage $h_{55}$ of the fifth observation ($x_5 = 10$)? Recall $h_{ii} = \tfrac{1}{n} + (x_i - \bar x)^2 / \sum_j (x_j - \bar x)^2$.

Show answer
Correct answer: C

$\bar x = (1+2+3+4+10)/5 = 4$. Deviations $(x_i - \bar x) = (-3, -2, -1, 0, 6)$, so $\sum(x_j-\bar x)^2 = 9 + 4 + 1 + 0 + 36 = 50$. Then $h_{55} = 1/5 + 36/50 = 0.20 + 0.72 = 0.92$. The fifth observation has very high leverage — it sits at the far end of $x$.

A reports only the baseline $1/n$ term and forgets the $(x_i - \bar x)^2$ contribution. B reports just the second term $36/50 = 0.72$ and forgets the $1/n$ baseline. D adds a full $1$ instead of $1/n$ — confusing the prediction-interval "+1" with the leverage formula.

Atoms: residual-diagnostics, design-matrix-and-hat-matrix.

Question 17 3 points

On a residuals-vs-leverage plot, four points are highlighted. Which one is *most* dangerous for the OLS fit (i.e. exerts the largest influence on $\hat{\boldsymbol\beta}$)?

Show answer
Correct answer: C

Cook's distance combines both: $D_i \propto \tilde r_i^2 \cdot h_{ii}/(1-h_{ii})$. The dangerous corner is high leverage *and* large residual — the "fat kid far from centre on the seesaw" image.

A has no lever arm — the point can have a large residual but limited pull on the line. B is "harmless high leverage" — the point sits on the fitted line, so it doesn't bend it. D has neither lever nor residual; the safest case.

Atoms: residual-diagnostics. Lecture: L08-classif-2.

Question 18 4 points

Two predictors $x_1$ and $x_2$ are highly correlated (sample correlation $\approx 0.99$). Mark each statement about the OLS fit as true or false.

Show answer
  1. True — collinearity makes $\mathbf{X}^\top\mathbf{X}$ near-singular, so $(\mathbf{X}^\top\mathbf{X})^{-1}$ has large diagonal entries → SE explodes.
  2. False — under the classical assumptions, OLS is unbiased even under collinearity; only the variance is inflated.
  3. True — the canonical reason to do the joint F-test before drilling into per-coefficient t's.
  4. False — predictions are typically *stable*, not unreliable: even when individual coefficients have huge SEs, the *sum* $\beta_1 x_1 + \beta_2 x_2$ is well-determined, so $\hat y$ on data similar to the training set is fine. Collinearity hurts inference about individual $\beta_j$, not in-sample prediction.

Atoms: collinearity, sampling-distribution-of-beta, multivariate-normal (the joint $N_{p+1}(\boldsymbol\beta, \sigma^2(\mathbf{X}^\top\mathbf{X})^{-1})$ distribution of $\hat{\boldsymbol\beta}$ is the foundation for inflated SEs under collinearity).

Question 19 3 points

You want to test whether *any* of the predictors in a multiple regression with $p = 5$ slopes contributes to predicting $Y$. Which is the appropriate procedure?

Show answer
Correct answer: C

The F-test on the joint null is exactly the question being asked. It is also robust to collinearity: individual t's can both fail while the joint F is highly significant. Note: you are not expected to compute the F-statistic on this exam — only to know what null it tests and why you'd reach for it.

A inflates the family-wise false-positive rate (the multiple-testing trap), and ignores collinearity-masked signals. B is meaningless — RSE estimates $\sigma$ but doesn't test "any predictor matters". D invents a non-existent rejection rule.

Atoms: f-test, t-test-and-significance, collinearity. Lecture: L06-linreg-2.

Question 20 4 points CE1 P2c

Earthworm weight $Y$ is modelled on stomach circumference $X$ (continuous) and genus $G \in \{L, N, Oc\}$ (factor; L as reference) via $\hat Y = \hat\beta_0 + \hat\beta_1 X + \hat\beta_2 \mathbf{1}_{G=N} + \hat\beta_3 \mathbf{1}_{G=Oc}$. The fitted equation for genus Oc is:

Show answer
Correct answer: B

For genus Oc, $\mathbf{1}_{G=N} = 0$ and $\mathbf{1}_{G=Oc} = 1$, so the dummy contribution is $\hat\beta_3$ added to the intercept. Slope $\hat\beta_1$ is the same across genera (no interaction).

A is the reference-genus equation (L). C is genus N. D adds both dummies as if they could be active simultaneously — but each observation belongs to exactly one genus.

Atoms: categorical-encoding-and-interactions.

Question 21 3 points

A categorical predictor takes $K = 4$ levels. Why does R encode it as $3$ dummy columns rather than $4$?

Show answer
Correct answer: A

With $K$ dummies plus an intercept, the column of ones equals the sum of the four dummies — perfect collinearity, so $(\mathbf{X}^\top\mathbf{X})^{-1}$ doesn't exist. Drop one dummy to absorb the reference level into the intercept.

B confuses the K-dummy issue with the *0/1/2* coding mistake (which is a different trap — that one imposes ordering, but is also wrong for separate reasons). C invents a non-existent design. D inverts cause and effect: degrees of freedom are a *consequence* of identifiability, not the reason for it.

Atoms: categorical-encoding-and-interactions, design-matrix-and-hat-matrix, collinearity.

Question 22 3 points

Mark each statement about interaction terms in linear regression as true or false.

Show answer
  1. True — the prof's flagged "non-negotiable" main-effects rule. Including only $X\cdot Z$ is almost always wrong.
  2. True — the canonical interaction trap. Don't read it as a global average.
  3. False — interactions are what allow different slopes. The "different intercepts only" picture is the no-interaction model.
  4. False — the interaction t-test is on the *interaction coefficient*, which measures the slope *difference* between groups, not the main effect of $X$. A small p-value on $X\cdot Z$ tells you the slopes differ, not that the average slope is non-zero.

Atoms: categorical-encoding-and-interactions. Lecture: L06-linreg-2.

Question 23 3 points

Hourly temperature is regressed on time of day. Sampling resolution is then refined from hourly to every $5$ minutes (so $n$ grows by $\sim 12\times$). Reported t-statistics for the time slope shoot up; the p-value drops to $<10^{-9}$. The most accurate critique is:

Show answer
Correct answer: D

Independence (assumption 5) is violated by serial correlation. The OLS algebra still computes $\hat\beta$, but the reported uncertainty is wrong — the prof's "horseshit standard errors" critique. Effective $n$ is much less than apparent $n$.

A confuses bias of $\hat\beta$ with bias of inference; under correlated errors point estimates are still unbiased. B invokes a different problem (overfitting), unrelated to dependence. C is a stretch — Gaussian is robust to mild boundedness, and mismatched.

Atoms: gaussian-error-assumptions, sampling-distribution-of-beta. Lecture: L05-linreg-1.

Question 24 3 points

Mark each statement about the residuals-vs-fitted plot as true or false.

Show answer
  1. True — fanning means $\mathrm{Var}(\varepsilon_i)$ scales with the fitted value, breaking common-variance.
  2. True — systematic shape in residuals against fitted values says the conditional mean is mis-specified; add a transformation or polynomial.
  3. False — a flat band is *consistent with* the assumptions, not proof of them. Residuals-vs-fitted only checks linearity and constant variance; it can't see independence (look at order plots) and can't see normality (look at the QQ plot). "Consistent with" ≠ "proves".

Atoms: residual-diagnostics, gaussian-error-assumptions.

Question 25 3 points

Which statement best captures the difference between an *error* $\varepsilon_i$ and a *residual* $e_i = y_i - \hat y_i$ in the linear-regression model?

Show answer
Correct answer: D

The prof's distinction: $\varepsilon_i$ is the *random* deviation in the population model, never directly observed. $e_i$ is the *observed* prediction of that error after fitting. Raw residuals additionally have $\mathrm{Cov}(\mathbf{e}) = \sigma^2(\mathbf{I}-\mathbf{H})$, which is why we standardise them for diagnostics.

A inverts the two roles. B is wrong even with an intercept — residuals are predictions of unobserved errors, not equal to them. C mixes up bias of an estimator (variance) with the variables themselves.

Atoms: gaussian-error-assumptions, residual-diagnostics.

Question 26 4 points Exam 2024 P3

Under the linear model $y_i = \mathbf{x}_i^\top\boldsymbol\beta + \varepsilon_i$ with $\varepsilon_i \overset{\text{iid}}\sim N(0, \sigma^2)$, what is the *correct* short justification that the least-squares estimator equals the maximum-likelihood estimator?

Show answer
Correct answer: B

The prof's flagged mathy template (verbatim from L27 / 2024 exam P3): write the log-likelihood, drop terms not depending on $\boldsymbol\beta$, identify the SSE, conclude. About 6–10 lines on the exam.

A confuses unbiasedness with the MLE property — they're different. C confuses computational convenience with the optimisation objective — closed form is a property of the algebra, not the principle. D is nonsense — symmetry is not a sufficient condition for being an MLE.

Atoms: least-squares-and-mle, gaussian-error-assumptions. Lecture: L27-summary.

Question 27 3 points

Differentiating the matrix-form residual sum of squares $\mathrm{RSS}(\boldsymbol\beta) = (\mathbf{y} - \mathbf{X}\boldsymbol\beta)^\top(\mathbf{y} - \mathbf{X}\boldsymbol\beta)$ with respect to $\boldsymbol\beta$ and setting the result to zero gives:

Show answer
Correct answer: A

$\partial\mathrm{RSS}/\partial\boldsymbol\beta = -2\mathbf{X}^\top\mathbf{y} + 2\mathbf{X}^\top\mathbf{X}\boldsymbol\beta = \mathbf{0}$ gives the normal equations. Solving (when $\mathbf{X}^\top\mathbf{X}$ is invertible) yields $\hat{\boldsymbol\beta} = (\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top\mathbf{y}$.

B is the residual-equals-zero condition, which only holds for $n \le p+1$ (degenerate). C reorders the matrix product and uses scalar division for matrices — algebraically illegal. D mis-orders the inverse and projection: $(\mathbf{X}^\top\mathbf{X})^{-1}$ is followed by $\mathbf{X}^\top$, not premultiplied by $\mathbf{X}$.

Atoms: least-squares-and-mle, design-matrix-and-hat-matrix.

Question 28 3 points

Let $\mathbf{H} = \mathbf{X}(\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top$ be the hat matrix in OLS. Mark each statement as true or false.

Show answer
  1. True — useful: leverage can be computed from the design alone, before $\mathbf{y}$ is observed.
  2. False — $\sum h_{ii} = \mathrm{tr}(\mathbf{H}) = p + 1$, the number of fitted parameters (counting the intercept), *not* $n$. The trace of a projection matrix equals the rank of its column space, and that's $p+1$ here.
  3. True — $\mathbf{H}$ is the orthogonal projection onto the column space of $\mathbf{X}$.

Atoms: design-matrix-and-hat-matrix, residual-diagnostics.

Question 29 3 points

A regression of life expectancy on years of education estimates $\hat\beta = 0.6$ years of life per year of schooling, with a small p-value. Which of the following claims is best supported by this OLS fit alone?

Show answer
Correct answer: C

Regression measures conditional association, not causation — the prof's "fancy correlations" line. The slope is the average change in $\hat Y$ associated with a one-unit change in the predictor, conditional on the other predictors in the model. Lurking variables, reverse causation, and selection effects can all generate a non-zero slope.

A reads the slope as causal — the canonical exam-flagged mistake. B overstates: a significant non-zero slope is consistent with many populations, including ones where education is a minor driver. D contradicts the stated significant fit.

Atoms: linear-regression, t-test-and-significance. Lecture: L05-linreg-1.