Confidence and prediction intervals

Two intervals around a regression prediction. CI = where the true mean response lies; PI = where a future observation lands. PI is always wider because it adds the irreducible noise . Both narrowest where data is densest, fanning out at the extremes.

Definition (prof’s framing)

For a fixed test point :

  • Confidence interval = uncertainty in as an estimator of the expected response .
  • Prediction interval = uncertainty in as a prediction of a future observation at , including the irreducible noise.

“Plotting the confidence and prediction intervals around all predicted values one obtains the confidence range or confidence band for the expected values of . … The prediction range is much broader than the confidence range.” , module 3 slides

CI for an individual coefficient (L05-linreg-1): . For 95% with reasonable : .

Notation & setup

  • = test point (fixed covariates).
  • = point prediction.
  • Use the t-distribution with df (since is estimated). For large , t ≈ N.
  • = nominal confidence level (typically 0.95).

Formula(s) to know cold

Confidence interval for a single coefficient:

Confidence interval for the mean response at :

Prediction interval for a future observation at :

The PI carries an extra +1 under the square root , that’s the irreducible contribution. It’s why PI > CI always.

Insights & mental models

Two sources of uncertainty in PI, one in CI

CI accounts for: uncertainty in (only).

PI accounts for: uncertainty in + irreducible noise .

“To answer this question [PI], we have to sum uncertainty over two components: (1) the uncertainty in the predicted value (due to uncertainty in ); (2) the irreducible error .” , module 3 slides

The frequentist CI interpretation

“There is a 95% probability that the interval [from the random procedure] will contain the true value of .” , module 3 slides

Crucially: the interval is random, the parameter is fixed. Repeat the experiment many times → ~95% of the constructed intervals cover the true . CE1 problem 2g (true/false on p-values) hammers the related ” is the probability is true” trap; CIs have the same misinterpretation risk.

Interval shape

Both CI and PI are narrowest near the centroid of the data and fan out at the extremes , the term grows with distance from the mean. Visually: the CI band hugs the line; the PI band is a wide envelope.

Why CI for x₀ᵀβ ≠ PI for Y at x₀

The CI is for the mean , the expected response. The PI is for an individual observation , a single random draw from . Even with infinite data (), the CI shrinks to a point but the PI stays wide because .

Exam signals

“We will discuss confidence and prediction ranges in the (more general) multiple linear regression setup.” , module 3 slides

Confidence intervals (CIs) are a much more informative way to report results than -values!” , module 3 slides

(Both intervals are derived in problem 2 of the recommended exercises.) , L06-linreg-2

Pitfalls

  • CI vs PI confusion. If asked “what’s the uncertainty around an individual prediction at ?” → PI. If asked “what’s the uncertainty in the average response at ?” → CI. Mixing them up is the canonical exam slip.
  • CI for vs CI for vs PI for at . Three distinct objects, three different formulas; don’t conflate. Exercise 3.2d makes you walk through all three.
  • Misinterpreting “95% probability.” It’s the procedure’s coverage rate, not “this specific interval has 95% probability of containing .” Once you compute the interval, is either in it or not.
  • PI fails if assumptions fail. Both rely on Gaussian errors; PI especially relies on the residual variance estimate being valid. Heteroscedasticity → PI is wrong.
  • Always wider for PI. A common slip: PI ⊃ CI strictly. If you draw a band that has CI > PI, you’ve swapped them.

Scope vs ISLP

  • In scope: difference between CI and PI; their derivation in matrix form; the t-distribution-based formulas; the band shape; the frequentist interpretation.
  • Look up in ISLP: §3.2.2 (pp. 81–82, Predictions) , concise treatment with the +1 in the PI; figure 3.6 shows the band shape.
  • Skip in ISLP: Bayesian credible intervals , out of scope. Bonferroni / multiple-testing corrections to the CI , never covered.

Exercise instances

  • Exercise3.2b: simulate for datasets; check empirically that the 95% CI covers and ~95% of the time
  • Exercise3.2c: same simulation philosophy, but for the PI at a fixed
  • Exercise3.2d: construct CI for ; explain the connection between CI for , CI for , and PI for at

How it might appear on the exam

  • Distinguish CI and PI. Definition or T/F question on which is wider, what each represents, which contains .
  • Read intervals from a band plot. Given a regression with the usual two-band plot, identify which is CI which is PI; predict at a new value and quote the appropriate interval.
  • Frequentist interpretation T/F. “If we computed 100 95% CIs from 100 random samples, ~95 would cover the true .” Correct interpretation.
  • CI from regression output. Given and from a table, compute the 95% CI as (1.96 ≈ 2 trick).
  • Derive PI from CI by adding . Conceptual question: how does the formula change?