Contrasts (linear combinations)

A contrast is any linear combination of a random vector’s components , e.g. N − S, E + W, or (E + W) − (N + S) on the cork-deposit data. Once you write the contrasts as $Z = C X$ for a constant matrix $C$ , their expectations and covariances drop out of the random-vector machinery: $E (Z) = C μ$ and $Cov (Z) = C Σ C^{⊤}$ . The prof flagged the cork worked example as “a good exercise to do in the exercise session.”

Definition (prof’s framing)

“[A contrast is] any linear combination of the variables you find interesting: e.g. N − S, E + W, (E + W) − (N + S). Once you’ve defined them as new variables, you can take their expectations and covariances using the same machinery.” - L04-statlearn-3

Formally, given a random vector $X_{(p \times 1)}$ and a constant matrix $C_{(k \times p)}$ , the new random vector $Z = C X = \sum_{j = 1}^{p} c_{1 j} X_{j} ⋮ \sum_{j = 1}^{p} c_{k j} X_{j}$ holds the $k$ contrasts.

Notation & setup

$X_{(p \times 1)}$ : original random vector with mean $μ$ and covariance $Σ$ .
$C_{(k \times p)}$ : contrast matrix. Each row is one linear combination’s coefficients.
$Z_{(k \times 1)} = C X$ : vector of $k$ contrasts.
“Contrast” in the strict statistical sense means the row-coefficients sum to zero (so the linear combination is invariant under shifts of the mean), but in this course “contrast” is used loosely for any linear combination of interest. The cork example “E + W” wouldn’t qualify under the strict definition; the prof’s usage is the looser one.

Formula(s) to know cold

The two formulas to know cold here are the same two from random-vector-and-covariance: $E (Z) = E (C X) = C μ, Cov (Z) = Cov (C X) = C Σ C^{⊤}$

If $X$ is multivariate normal, then $Z = C X$ is also multivariate normal , that’s one of the four useful properties of the multivariate-normal from the slides (“linear combinations of components are multivariate normal”). So $Z \sim N_{k} (C μ, C Σ C^{⊤})$ .

Insights & mental models

Contrasts as feature engineering for the multivariate setting. You define new variables that capture the comparison you actually care about , e.g. on the cork data, “is the cork deposit denser on the south side than the north?” maps to $Y_{1} = X_{S} - X_{N}$ , and “is the east-west contrast different from the north-south contrast?” maps to $Y_{3} = (X_{E} + X_{W}) - (X_{N} + X_{S})$ . Once you have $C$ , the rest is matrix algebra.

The cork worked example (modules/2StatLearn/2StatLearn.2.md / L04-statlearn-3):

For three contrasts (N − S, E + W, (E + W) − (N + S)) on $X = (X_{N}, X_{E}, X_{S}, X_{W})^{⊤}$ : $C = 10 - 1 011 - 1 0 - 1 011$

Then $E (Z) = C μ$ and $Cov (Z) = C Σ C^{⊤}$ , both computed by direct matrix multiplication. In R: C %*% mu and C %*% Sigma %*% t(C). The exercise asks you to write down $C$ , identify $E (Y_{1})$ as $μ_{N} - μ_{S}$ , and find $Cov (Y_{1}, Y_{3}) =$ (row-1 of $C Σ$ ) dotted with (row-3 of $C$ ).

Why contrasts matter beyond M2:

Hypothesis tests in regression (Q4 of the four “important questions” in M3): testing whether two coefficients are equal, or whether a sum of coefficients differs from a baseline, is a contrast on $\hat{β}$ . The covariance machinery $Var (C \hat{β}) = σ^{2} C (X^{⊤} X)^{- 1} C^{⊤}$ is the foundation for the standard errors of those tests.
Categorical predictors with K levels (categorical-encoding-and-interactions): the K − 1 dummies define K − 1 contrasts against the reference level. R’s contrasts() function is named for exactly this. The prof uses “contrast” in this regression sense in M3.
PCA (principal-component-analysis): each principal component is a contrast , a linear combination of the standardized predictors. The loadings $ϕ_{j m}$ are the contrast coefficients, the PC variances are $Cov (C X)$ on the diagonal.
LDA / QDA: the discriminant functions $δ_{k} (x) = x^{⊤} Σ^{- 1} μ_{k} - \frac{1}{2} μ_{k}^{⊤} Σ^{- 1} μ_{k} + lo g π_{k}$ are linear contrasts in $x$ , the decision boundary is where two such contrasts are equal.

So while the M2 atom is light, the machinery (covariance of a linear transformation) is everywhere downstream.

Reading the result. $Cov (Z)$ is a $k \times k$ matrix whose diagonal entries are $Var (Y_{i}) =$ (row- $i$ of $C$ ) $Σ$ (row- $i$ of $C$ ) $^{⊤}$ , and whose off-diagonal entries are the covariances between contrasts. From there you can pull correlations between contrasts the usual way.

Pitfalls

Order matters in $C Σ C^{⊤}$ , not $C^{⊤} Σ C$ . Easy transpose mistake.
The constant matrix $C$ has to be conformable: $C$ is $(k \times p)$ , $Σ$ is $(p \times p)$ , so $C Σ$ is $(k \times p)$ and $C Σ C^{⊤}$ is $(k \times k)$ . Check dimensions first.
A “contrast” in this course is just any linear combination: don’t get hung up on the stricter definition that requires the coefficients to sum to zero.
If $C$ has linearly dependent rows, $Cov (Z)$ will be singular even if $Σ$ wasn’t , the contrasts you defined aren’t truly $k$ -dimensional.
Mean-centering vs not: since $Cov$ uses $X - μ$ , the constant intercept term doesn’t appear in $Cov (Z)$ , but $E (Z) = C μ$ does carry the means.

Scope vs ISLP

In scope: writing down a contrast matrix $C$ , computing $E (C X)$ and $Cov (C X)$ by hand, the cork example.
Look up in ISLP: ISLP doesn’t have a dedicated “contrasts” section in chapter 2 , the closest treatment is the categorical-encoding discussion in §3.3.1 (and the implicit contrast-matrix view of dummy coding). For the matrix-algebra theory, ISLP is light; Johnson & Wichern or any multivariate-stats text covers it formally.
Skip in ISLP: none , this is a matrix-algebra fact, not a textbook topic.

Exercise instances

No recommended-exercise problem tagged “contrasts” specifically. The cork worked example in the slide deck (modules/2StatLearn/2StatLearn.2.md) and the prof’s “good exercise to do in the exercise session” remark in L04-statlearn-3 are the de facto exercise instance , write down $C$ for (N − S, E + W, (E + W) − (N + S)), compute $E (Y)$ and $Cov (Y)$ analytically and in R.

The downstream applications (regression coefficient testing in M3, PCA loadings in M10, discriminant functions in M4) are exercised heavily in their own atoms.

How it might appear on the exam

Direct hand-calc. Given $Σ$ and a $C$ (probably 2 × 3 or 2 × 4), compute $Cov (C X) = C Σ C^{⊤}$ . Pure plug-and-chug, well-suited to the 2026 short-answer format.
Identify a contrast in a regression context. Given a regression with three group dummies, write the contrast matrix that tests “group 1 vs the average of groups 2 and 3.”
Combined with multivariate normal. Show that if $X \sim N_{p} (μ, Σ)$ , then $C X \sim N_{k} (C μ, C Σ C^{⊤})$ . (One-line application of the multivariate-normal property.)
Conceptual: why are PCA loadings called “contrasts”? Because each PC is a linear combination of the standardized variables, and the loadings give the coefficients of that combination.

random-vector-and-covariance: the parent atom; the formulas $E (C X) = C μ$ and $Cov (C X) = C Σ C^{⊤}$ live there
multivariate-normal: linear combinations of MVN are MVN; that’s the route from this atom into LDA/QDA
linear-regression: the regression coefficients themselves are a linear-combination story; their distribution follows the same machinery
categorical-encoding-and-interactions: dummy-coded categorical predictors define contrasts against a reference level
principal-component-analysis: each PC is a contrast in the strict sense (the loadings are the coefficient vector)

statistical.dog

Explorer

contrasts

Contrasts (linear combinations)

Definition (prof’s framing)

Notation & setup

Formula(s) to know cold

Insights & mental models

Pitfalls

Scope vs ISLP

Exercise instances

How it might appear on the exam

Graph View

Table of Contents

Backlinks

statistical.dog

Explorer

contrasts

Contrasts (linear combinations)

Definition (prof’s framing)

Notation & setup

Formula(s) to know cold

Insights & mental models

Pitfalls

Scope vs ISLP

Exercise instances

How it might appear on the exam

Related

Graph View

Table of Contents

Backlinks