Sampling distribution of β̂

Under the classical Gaussian linear model, is exactly multivariate normal, centered on the true (unbiased) with covariance . This is the source of all subsequent inference: t-tests, F-tests, CIs, PIs.

Definition (prof’s framing)

“If you have a billion parameters, what’s the uncertainty of them, and they’re all working against each other? It becomes very confusing. But in this case, you can do it very well.” - L05-linreg-1

Multiple regression result (proved in Exercise 3.2a, derived in L06-linreg-2):

Simple regression special case: each component is univariate Gaussian, centered on the true value, with variance read off the diagonal of the covariance matrix.

“That’s what we want. If it was biased then we’d be upset because then our model is not going to give us the right shit.” - L06-linreg-2

Notation & setup

  • : see design-matrix-and-hat-matrix.
  • True unknown; we estimate .
  • unknown; estimated by .
  • , where is the -th diagonal of .

Formula(s) to know cold

Multiple regression:

Per-coefficient variance:

Simple regression closed forms (the only ones easy to write without matrix inversion):

Residual standard error (estimator of ):

For simple regression, in the denominator (two df eaten by and ). With multiple regression, .

Insights & mental models

Derivation in three lines

The proof (Exercise 3.2a, sketched in L06-linreg-2):

Write with . Use . Then by the linear-transformation property of the multivariate normal:

  • .
  • .
  • Linear function of multivariate normal → multivariate normal.

Conclusion: . ∎

Experiment design from the variance formula

The simple-regression form

tells you how to design experiments. We can’t shrink (it’s a property of the noise), but we can:

  • Increase : more samples.
  • Spread wider: sample further apart in .

“It is kind of weird to think that you can look at these equations and then from that gain an intuition of how you can do your experiment better. But you do.” - L05-linreg-1

Significance is just sample size

“If is infinity… your standard [error] is going to be small as shit, which means it’s going to look significant even if it isn’t.” - L05-linreg-1

The variance shrinks like , so any non-zero effect eventually becomes statistically significant for big enough . See t-test-and-significance.

Why matters: collinearity

The variance has the constant- baked in, plus the data-dependent factor . When two predictors are nearly the same, is near-singular, its inverse blows up, the diagonal entries grow without bound, variances explode. See collinearity.

“This factor X transpose X comes into play in particular when two variables are basically the same, because then they can trade off each other and then this variance explodes.” - L06-linreg-2

Estimated SE vs true SE

Strictly, the SE you can compute uses in place of the unknown . So tests use the t distribution (heavy-tailed Gaussian) with df, not the standard normal. For they’re indistinguishable.

Exam signals

“A lot of the reasons we ask those questions is so we can make tests on them.” - L06-linreg-2

“Result (proving this is problem 2 of the recommended exercises)” - L06-linreg-2

“I think this is really why statisticians love these distributions, because you can read out what’s going to happen when you look at them.” - L06-linreg-2

Pitfalls

  • Wrong df. Simple LR uses ; multiple LR uses (where is the number of slopes, not counting the intercept). Conventions differ in books; say which you’re using.
  • Estimated vs known . With unknown , use t with df, not standard normal. The hat is implicit in how R reports SE.
  • The multivariate covariance has off-diagonal entries. and are not generally independent in simple LR. Their covariance becomes zero only if .
  • Bias is a function of the model, not the estimator. is unbiased for the true in the assumed model. If the true model is non-linear, the LS slope is unbiased for the best linear approximation, not for the curve.
  • Inflation under collinearity. A coefficient estimate may be near-zero with a huge SE, looks “insignificant” but the joint test (F) over the correlated set may still be highly significant. See t-test-and-significance and f-test.

Scope vs ISLP

  • In scope: the multivariate normal sampling distribution, derivation of mean and covariance, the simple-regression SE formulas, residual standard error.
  • Look up in ISLP: §3.1.2 (pp. 63–66, simple LR SE), §3.2.1 (matrix-form result, lighter derivation).
  • Skip in ISLP: specifics of the t- and F-distributions are referenced but not derived; ISLP is light here. Walpole is the prof’s recommended classical reference for the distribution of .

Exercise instances

  • Exercise3.2a: full derivation: show has the stated distribution; what assumptions are needed; what does this imply for ; how to compute .

How it might appear on the exam

  • Write the distribution of (and the assumptions under which it holds): could be a true/false or short-derivation question.
  • Derive the per-coefficient variance in simple LR. Standard “show your work” question; hand-derive .
  • What happens to the SE when…? Add more data (down by ); spread wider (down); two predictors become highly correlated (up, collinearity).
  • Read SE from regression output. 2025 Q6a-style: given the table, what is the estimate, what is the 95% CI? Use .