Confidence and prediction intervals
Two intervals around a regression prediction. CI = where the true mean response lies; PI = where a future observation lands. PI is always wider because it adds the irreducible noise . Both narrowest where data is densest, fanning out at the extremes.
Definition (prof’s framing)
For a fixed test point :
- Confidence interval = uncertainty in as an estimator of the expected response .
- Prediction interval = uncertainty in as a prediction of a future observation at , including the irreducible noise.
“Plotting the confidence and prediction intervals around all predicted values one obtains the confidence range or confidence band for the expected values of . … The prediction range is much broader than the confidence range.” , module 3 slides
CI for an individual coefficient (L05-linreg-1): . For 95% with reasonable : .
Notation & setup
- = test point (fixed covariates).
- = point prediction.
- Use the t-distribution with df (since is estimated). For large , t ≈ N.
- = nominal confidence level (typically 0.95).
Formula(s) to know cold
Confidence interval for a single coefficient:
Confidence interval for the mean response at :
Prediction interval for a future observation at :
The PI carries an extra +1 under the square root , that’s the irreducible contribution. It’s why PI > CI always.
Insights & mental models
Two sources of uncertainty in PI, one in CI
CI accounts for: uncertainty in (only).
PI accounts for: uncertainty in + irreducible noise .
“To answer this question [PI], we have to sum uncertainty over two components: (1) the uncertainty in the predicted value (due to uncertainty in ); (2) the irreducible error .” , module 3 slides
The frequentist CI interpretation
“There is a 95% probability that the interval [from the random procedure] will contain the true value of .” , module 3 slides
Crucially: the interval is random, the parameter is fixed. Repeat the experiment many times → ~95% of the constructed intervals cover the true . CE1 problem 2g (true/false on p-values) hammers the related ” is the probability is true” trap; CIs have the same misinterpretation risk.
Interval shape
Both CI and PI are narrowest near the centroid of the data and fan out at the extremes , the term grows with distance from the mean. Visually: the CI band hugs the line; the PI band is a wide envelope.
Why CI for x₀ᵀβ ≠ PI for Y at x₀
The CI is for the mean , the expected response. The PI is for an individual observation , a single random draw from . Even with infinite data (), the CI shrinks to a point but the PI stays wide because .
Exam signals
“We will discuss confidence and prediction ranges in the (more general) multiple linear regression setup.” , module 3 slides
“Confidence intervals (CIs) are a much more informative way to report results than -values!” , module 3 slides
(Both intervals are derived in problem 2 of the recommended exercises.) , L06-linreg-2
Pitfalls
- CI vs PI confusion. If asked “what’s the uncertainty around an individual prediction at ?” → PI. If asked “what’s the uncertainty in the average response at ?” → CI. Mixing them up is the canonical exam slip.
- CI for vs CI for vs PI for at . Three distinct objects, three different formulas; don’t conflate. Exercise 3.2d makes you walk through all three.
- Misinterpreting “95% probability.” It’s the procedure’s coverage rate, not “this specific interval has 95% probability of containing .” Once you compute the interval, is either in it or not.
- PI fails if assumptions fail. Both rely on Gaussian errors; PI especially relies on the residual variance estimate being valid. Heteroscedasticity → PI is wrong.
- Always wider for PI. A common slip: PI ⊃ CI strictly. If you draw a band that has CI > PI, you’ve swapped them.
Scope vs ISLP
- In scope: difference between CI and PI; their derivation in matrix form; the t-distribution-based formulas; the band shape; the frequentist interpretation.
- Look up in ISLP: §3.2.2 (pp. 81–82, Predictions) , concise treatment with the +1 in the PI; figure 3.6 shows the band shape.
- Skip in ISLP: Bayesian credible intervals , out of scope. Bonferroni / multiple-testing corrections to the CI , never covered.
Exercise instances
- Exercise3.2b: simulate for datasets; check empirically that the 95% CI covers and ~95% of the time
- Exercise3.2c: same simulation philosophy, but for the PI at a fixed
- Exercise3.2d: construct CI for ; explain the connection between CI for , CI for , and PI for at
How it might appear on the exam
- Distinguish CI and PI. Definition or T/F question on which is wider, what each represents, which contains .
- Read intervals from a band plot. Given a regression with the usual two-band plot, identify which is CI which is PI; predict at a new value and quote the appropriate interval.
- Frequentist interpretation T/F. “If we computed 100 95% CIs from 100 random samples, ~95 would cover the true .” Correct interpretation.
- CI from regression output. Given and from a table, compute the 95% CI as (1.96 ≈ 2 trick).
- Derive PI from CI by adding . Conceptual question: how does the formula change?
Related
- linear-regression: the underlying model
- sampling-distribution-of-beta: both intervals derive from this
- t-test-and-significance: same machinery, different question
- gaussian-error-assumptions: both intervals require these