Confidence and prediction intervals

Two intervals around a regression prediction. CI = where the true mean response lies; PI = where a future observation lands. PI is always wider because it adds the irreducible noise $σ^{2}$ . Both narrowest where data is densest, fanning out at the extremes.

Definition (prof’s framing)

For a fixed test point $x_{0}$ :

Confidence interval = uncertainty in $\overset{y}{^}_{0} = x_{0}^{⊤} \hat{β}$ as an estimator of the expected response $x_{0}^{⊤} β$ .
Prediction interval = uncertainty in $\overset{y}{^}_{0}$ as a prediction of a future observation $y_{new} = x_{0}^{⊤} β + ε_{new}$ at $x_{0}$ , including the irreducible noise.

“Plotting the confidence and prediction intervals around all predicted values $\hat{Y}_{0}$ one obtains the confidence range or confidence band for the expected values of $Y$ . … The prediction range is much broader than the confidence range.” , module 3 slides

CI for an individual coefficient $β_{j}$ (L05-linreg-1): $\hat{β}_{j} \pm t_{1 - α /2, n - p - 1} \cdot SE (\hat{β}_{j})$ . For 95% with reasonable $n$ : $t \approx 1.96 \approx 2$ .

Notation & setup

$x_{0}$ = test point (fixed covariates).
$\overset{y}{^}_{0} = x_{0}^{⊤} \hat{β}$ = point prediction.
Use the t-distribution with $n - p - 1$ df (since $σ$ is estimated). For large $n$ , t ≈ N.
$1 - α$ = nominal confidence level (typically 0.95).

Formula(s) to know cold

Confidence interval for a single coefficient:

$\hat{β}_{j} \pm t_{1 - α /2, n - p - 1} \cdot SE (\hat{β}_{j}) .$

Confidence interval for the mean response at $x_{0}$ :

$x_{0}^{⊤} \hat{β} \pm t_{1 - α /2, n - p - 1} \cdot \overset{σ}{^} x_{0}^{⊤} (X^{⊤} X)^{- 1} x_{0} .$

Prediction interval for a future observation at $x_{0}$ :

$x_{0}^{⊤} \hat{β} \pm t_{1 - α /2, n - p - 1} \cdot \overset{σ}{^} 1 + x_{0}^{⊤} (X^{⊤} X)^{- 1} x_{0} .$

The PI carries an extra +1 under the square root , that’s the irreducible $σ^{2}$ contribution. It’s why PI > CI always.

Insights & mental models

Two sources of uncertainty in PI, one in CI

CI accounts for: uncertainty in $\hat{β}$ (only).

PI accounts for: uncertainty in $\hat{β}$ + irreducible noise $ε_{0}$ .

“To answer this question [PI], we have to sum uncertainty over two components: (1) the uncertainty in the predicted value $\overset{y}{^}_{0}$ (due to uncertainty in $\hat{β}$ ); (2) the irreducible error $ε_{0} \sim N (0, σ^{2})$ .” , module 3 slides

The frequentist CI interpretation

“There is a 95% probability that the interval [from the random procedure] will contain the true value of $β_{j}$ .” , module 3 slides

Crucially: the interval is random, the parameter is fixed. Repeat the experiment many times → ~95% of the constructed intervals cover the true $β_{j}$ . CE1 problem 2g (true/false on p-values) hammers the related ” $1 - p$ is the probability $H_{0}$ is true” trap; CIs have the same misinterpretation risk.

Interval shape

Both CI and PI are narrowest near the centroid of the data and fan out at the extremes , the $x_{0}^{⊤} (X^{⊤} X)^{- 1} x_{0}$ term grows with distance from the mean. Visually: the CI band hugs the line; the PI band is a wide envelope.

Why CI for x₀ᵀβ ≠ PI for Y at x₀

The CI is for the mean , the expected response. The PI is for an individual observation , a single random draw from $N (x_{0}^{⊤} β, σ^{2})$ . Even with infinite data ( $\hat{β} \to β$ ), the CI shrinks to a point but the PI stays wide because $σ^{2} > 0$ .

Exam signals

“We will discuss confidence and prediction ranges in the (more general) multiple linear regression setup.” , module 3 slides

“Confidence intervals (CIs) are a much more informative way to report results than $p$ -values!” , module 3 slides

(Both intervals are derived in problem 2 of the recommended exercises.) , L06-linreg-2

Pitfalls

CI vs PI confusion. If asked “what’s the uncertainty around an individual prediction at $x_{0} = 50$ ?” → PI. If asked “what’s the uncertainty in the average response at $x_{0} = 50$ ?” → CI. Mixing them up is the canonical exam slip.
CI for $β_{j}$ vs CI for $x_{0}^{⊤} β$ vs PI for $y$ at $x_{0}$ . Three distinct objects, three different formulas; don’t conflate. Exercise 3.2d makes you walk through all three.
Misinterpreting “95% probability.” It’s the procedure’s coverage rate, not “this specific interval has 95% probability of containing $β$ .” Once you compute the interval, $β$ is either in it or not.
PI fails if assumptions fail. Both rely on Gaussian errors; PI especially relies on the residual variance estimate being valid. Heteroscedasticity → PI is wrong.
Always wider for PI. A common slip: PI ⊃ CI strictly. If you draw a band that has CI > PI, you’ve swapped them.

Scope vs ISLP

In scope: difference between CI and PI; their derivation in matrix form; the t-distribution-based formulas; the band shape; the frequentist interpretation.
Look up in ISLP: §3.2.2 (pp. 81–82, Predictions) , concise treatment with the +1 in the PI; figure 3.6 shows the band shape.
Skip in ISLP: Bayesian credible intervals , out of scope. Bonferroni / multiple-testing corrections to the CI , never covered.

Exercise instances

Exercise3.2b: simulate $Y = 1 + 3 X + ε$ for $\sim 1000$ datasets; check empirically that the 95% CI covers $β_{0}$ and $β_{1}$ ~95% of the time
Exercise3.2c: same simulation philosophy, but for the PI at a fixed $x_{0} = 0.4$
Exercise3.2d: construct CI for $x_{0}^{⊤} β$ ; explain the connection between CI for $β_{j}$ , CI for $x_{0}^{⊤} β$ , and PI for $Y$ at $x_{0}$

How it might appear on the exam

Distinguish CI and PI. Definition or T/F question on which is wider, what each represents, which contains $σ^{2}$ .
Read intervals from a band plot. Given a regression with the usual two-band plot, identify which is CI which is PI; predict at a new $x$ value and quote the appropriate interval.
Frequentist interpretation T/F. “If we computed 100 95% CIs from 100 random samples, ~95 would cover the true $β$ .” Correct interpretation.
CI from regression output. Given $\hat{β}_{j}$ and $SE (\hat{β}_{j})$ from a table, compute the 95% CI as $\hat{β}_{j} \pm 2 \cdot SE$ (1.96 ≈ 2 trick).
Derive PI from CI by adding $σ^{2}$ . Conceptual question: how does the formula change?

linear-regression: the underlying model
sampling-distribution-of-beta: both intervals derive from this
t-test-and-significance: same machinery, different question
gaussian-error-assumptions: both intervals require these

statistical.dog

Explorer

confidence-and-prediction-intervals

Confidence and prediction intervals

Definition (prof’s framing)

Notation & setup

Formula(s) to know cold

Insights & mental models

Two sources of uncertainty in PI, one in CI

The frequentist CI interpretation

Interval shape

Why CI for x₀ᵀβ ≠ PI for Y at x₀

Exam signals

Pitfalls

Scope vs ISLP

Exercise instances

How it might appear on the exam

Graph View

Table of Contents

Backlinks

statistical.dog

Explorer

confidence-and-prediction-intervals

Confidence and prediction intervals

Definition (prof’s framing)

Notation & setup

Formula(s) to know cold

Insights & mental models

Two sources of uncertainty in PI, one in CI

The frequentist CI interpretation

Interval shape

Why CI for x₀ᵀβ ≠ PI for Y at x₀

Exam signals

Pitfalls

Scope vs ISLP

Exercise instances

How it might appear on the exam

Related

Graph View

Table of Contents

Backlinks