Random vectors, covariance and correlation matrices

The matrix-algebra plumbing the whole rest of the course rides on. A random vector $X$ is a $p$ -vector of random variables; its covariance matrix $Σ$ stacks variances on the diagonal and covariances off; its correlation matrix is $Σ$ rescaled by the standard-deviation diagonal so the diagonal is 1. Expectations of linear transformations follow $E (A X B) = A \cdot E (X) \cdot B$ and $Cov (C X) = C Σ C^{⊤}$ . Covariance is a linear notion, zero covariance does not mean independence except under joint normality.

Definition (prof’s framing)

“A random vector $X_{(p \times 1)}$ is a $p$ -dimensional vector of random variables.” - L04-statlearn-3

Mean vector: element-wise $E (X) = (E (X_{1}), \dots, E (X_{p}))^{⊤}$ . Covariance matrix:

” $σ_{ij} = Cov (X_{i}, X_{j}) = E [(X_{i} - μ_{i}) (X_{j} - μ_{j})]$ . … if they have a high covariance then they’re varying together; if they have a negative covariance then they’re varying the opposite way; and if it’s just all random then the covariance is going to be small in magnitude.” - L04-statlearn-3

When $i = j$ : it’s the variance $σ_{i}^{2}$ . The prof flagged this as a quiz-style fact.

Notation & setup

$X = (X_{1}, \dots, X_{p})^{⊤}$ : $p$ -dimensional random vector.
$μ = E (X) = (E (X_{1}), \dots, E (X_{p}))^{⊤}$ : mean vector.
$Σ = Cov (X) = E [(X - μ) (X - μ)^{⊤}]$ : $p \times p$ covariance matrix. Symmetric and (by construction) positive semi-definite.
$σ_{ij}$ = $(i, j)$ entry of $Σ$ , the covariance of $X_{i}$ and $X_{j}$ . Diagonal entries $σ_{ii} = σ_{i}^{2}$ are variances.
$ρ_{ij} = σ_{ij} / (σ_{i} σ_{j})$ : correlation. Lives in $[- 1, 1]$ .
$V^{1/2}$ = diagonal matrix of standard deviations. Then correlation matrix $ρ = (V^{1/2})^{- 1} Σ (V^{1/2})^{- 1}$ .

Formula(s) to know cold

Definition / shortcut: $Σ = E [(X - μ) (X - μ)^{⊤}] = E (X X^{⊤}) - μ μ^{⊤}$

Correlation from covariance (the CE1.1f / Exercise 2.3g calculation): $ρ_{ij} = \frac{σ _{ij}}{σ _{i}^{2} σ _{j}^{2}} = \frac{σ _{ij}}{σ _{i} σ _{j}}$

Expectation rules (L04-statlearn-3 proved on board): $E (X + Y) = E (X) + E (Y), E (A X B) = A \cdot E (X) \cdot B$

Covariance of a linear transformation $Z = C X$ (used heavily in contrasts and in deriving the sampling-distribution-of-beta): $E (Z) = C μ, Cov (Z) = C Σ C^{⊤}$

These are the two formulas the M2 quiz drills (Q4 and Q5 of the random-vectors menti, modules/2StatLearn/2StatLearn.2.md).

Univariate analogues (for sanity checks): $E (a X + b) = a E (X) + b$ , $Var (a X + b) = a^{2} Var (X)$ .

Insights & mental models

Element-wise proof of $E (A X B) = A \cdot E (X) \cdot B$ (L04-statlearn-3 on the board): $e_{ij} = \sum_{k} \sum_{l} a_{ik} X_{k l} b_{l j}$ . Take $E$ , pull constants out, element-wise it’s the $(i, j)$ entry of $A \cdot E (X) \cdot B$ . The prof flagged this as basically obvious but did the proof anyway, it’s the working machinery for everything in M3 onwards.

Covariance is a linear notion, the prof’s most-emphasized framing in L04-statlearn-3:

“We talk about things, I’m kind of always doing this [drawing a line], because we’re sort of assuming a linear line. We’re assuming some kind of linear function. This covariance is really getting at this notion of a slope.” - L04-statlearn-3

So zero covariance does not mean independent in general. It just means “no linear co-variation.” Two variables can be perfectly dependent (e.g. $Y = X^{2}$ ) but have zero covariance if the dependence is symmetric around zero. The exception is joint normality, zero covariance ⇒ independence, but only there. (See multivariate-normal for that property.)

The correlation matrix is the covariance matrix made unit-free: divide entry $σ_{ij}$ by $σ_{i}^{2} σ_{j}^{2}$ . Diagonal becomes 1; off-diagonals become Pearson correlations in $[- 1, 1]$ . Lets you compare across different variables / units.

Positive semi-definite is forced by the variance interpretation (modules/2StatLearn/2StatLearn.2.md hint): for any constant vector $b \neq = 0$ , $Var (b^{⊤} X) = b^{⊤} Σ b \geq 0$ . So $Σ$ is positive semi-definite by construction. If it’s singular (det = 0), some linear combination of the $X_{j}$ ‘s has zero variance, meaning it’s a deterministic function of the others, and the multivariate normal density doesn’t exist (you divide by $∣ Σ ∣^{1/2}$ ).

The cork-deposit example (L04-statlearn-3 running dataset): $n = 28$ cork trees, $p = 4$ holes drilled (N, E, S, W), measured weight per direction. Variables are very correlated within a tree (sun exposure aside, dense in one direction usually means dense in all). Used as the toy multivariate dataset for the $Σ \to ρ$ calculation drill, the contrasts example (N − S, E + W, etc.), and the matching exercise.

The $Σ \to ρ$ calculation by hand (Exercise 2.3g and the CE1.1f single-choice): compute $ρ_{ij} = σ_{ij} / σ_{i}^{2} σ_{j}^{2}$ . CE1.1f gives $Σ = [9 0.3 0.3 4]$ and asks for $ρ_{12} = 0.3/ 9 \cdot 4 = 0.3/6 = 0.05$ . Multiple-choice trap: “0.0083” comes from forgetting the square root; “0.15” from $0.3/2$ ; “0.10” from $0.3/3$ . Take the square root.

Where this plumbing leads

multivariate-normal: generalizes the bell curve using $Σ$ in the exponent (and $∣ Σ ∣$ in the normalizer).
sampling-distribution-of-beta: under Gaussian errors, $\hat{β} \sim N (β, σ^{2} (X^{⊤} X)^{- 1})$ . The covariance machinery is what gives you the standard errors.
linear-discriminant-analysis / quadratic-discriminant-analysis, class-conditional densities are multivariate normals with means $μ_{k}$ and covariance $Σ$ (LDA) or $Σ_{k}$ (QDA).
principal-component-analysis: eigen-decomposition of $Σ$ gives the PCs (the prof defers spectral theory to Linear Statistical Models, scope M2 out-of-scope, but the PVE story rides on $Σ$ ‘s eigenvalues).
collinearity: collinear predictors → $X^{⊤} X$ near-singular → $(X^{⊤} X)^{- 1}$ blows up → SEs explode.

Pitfalls

Zero covariance ≠ independence in general. Only under joint normality. Easy T/F trap.
Forgetting the square root in $ρ_{ij} = σ_{ij} / σ_{i}^{2} σ_{j}^{2}$ : that’s how the CE1.1f distractors are designed.
Symmetric proof for $Cov (C X) = C Σ C^{⊤}$ , not $C^{⊤} Σ C$ . Order matters; transpose goes on the right.
Singular $Σ$ (det = 0) means at least one variable is a perfect linear combination of the others: multivariate normal density doesn’t exist; LDA’s $Σ^{- 1}$ blows up; PCA has zero eigenvalue.
Be careful with row-vs-column conventions for the data matrix. ISLP uses rows = observations, columns = variables. The prof’s L02 notes the book convention and flags that other books transpose it.
Pearson correlation is linear correlation only. A perfect quadratic relationship can have $ρ = 0$ .

Scope vs ISLP

In scope: definition of random vector, mean vector, $Σ$ , $ρ$ , the two expectation rules, $Cov (C X) = C Σ C^{⊤}$ , the linear-vs-independence distinction, $Σ \to ρ$ hand calculation.
Look up in ISLP: §2.1 (introduction to notation), §3.2.4 (sampling distributions of regression coefficients) and §4.4 (LDA/QDA’s use of $Σ$ ). Hardle / Simar or Johnson & Wichern would be the deeper references; ISLP keeps the matrix algebra light.
Skip in ISLP: spectral / eigen-decomposition theory of $Σ$ . The prof verbatim (L04-statlearn-3): “we don’t talk about spectral decomposition”, deferred to TMA4267 Linear Statistical Models. Eigenvalues come back as PC variances in M10, but the full spectral machinery is out.

Exercise instances

Exercise 2.3g: given the covariance matrix of the Auto data’s quantitative columns, compute correlations between mpg and displacement / horsepower / weight by hand using $ρ_{ij} = σ_{ij} / σ_{i}^{2} σ_{j}^{2}$ ; verify against cor(Auto[, quant]). The drill that turns the formula into muscle memory.
Exercise 2.4: simulate 1000 draws from a bivariate normal with mvrnorm() under four $Σ$ settings: (i) $diag (1, 1)$ , (ii) $diag (1, 5)$ , (iii) $[1225]$ , (iv) $[1 - 2 - 2 5]$ . Plot, identify which scatter goes with which $Σ$ . Builds the visual mapping from $Σ$ to point-cloud shape, directly relevant to CE1.1g (contour matching).
CE1 problem 1f: single-choice MC: given $Σ = [9 0.3 0.3 4]$ , what’s $ρ_{12}$ ? Answer: $0.3/ 9 \cdot 4 = 0.05$ . The square-root-trap MC.

How it might appear on the exam

MC: correlation from a 2×2 $Σ$ . Direct CE1.1f format.
T/F: zero covariance ⇒ independence. False in general; true only for joint normality. Classic trap.
Hand calculation: $Cov (C X) = C Σ C^{⊤}$ for a small $C$ . Plug-and-chug; the contrasts cork-data exercise is the template.
Match scatter / contour plot to $Σ$ . The Exercise 2.4 visual: independent vs scaled vs positively-correlated vs negatively-correlated. The multivariate-normal atom owns the contour-matching question (CE1.1g), but the underlying $Σ$ -to-shape intuition lives here.
Identify which $Σ$ is singular (where $∣ Σ ∣ = 0$ ), and what that means for the density / for LDA.
Quiz-style fact recall. “What is $Cov (X_{i}, X_{i})$ ?” → $Var (X_{i})$ . From L04-statlearn-3’s flagged quiz fact.

contrasts: the canonical application of $Cov (C X) = C Σ C^{⊤}$
multivariate-normal: uses $Σ$ in the density; gives the joint-normal-only result that zero cov ⇒ independence
linear-regression, sampling-distribution-of-beta, $\hat{β} \sim N (β, σ^{2} (X^{⊤} X)^{- 1})$ uses this same machinery
linear-discriminant-analysis / quadratic-discriminant-analysis, class-conditional Gaussians built on $Σ$
principal-component-analysis: eigen-decomposition of $Σ$ (mechanics in M10; deferred from M2)
collinearity: what happens when $Σ$ (or $X^{⊤} X$ ) is near-singular

statistical.dog

Explorer

random-vector-and-covariance

Random vectors, covariance and correlation matrices

Definition (prof’s framing)

Notation & setup

Formula(s) to know cold

Insights & mental models

Where this plumbing leads

Pitfalls

Scope vs ISLP

Exercise instances

How it might appear on the exam

Graph View

Table of Contents

Backlinks

statistical.dog

Explorer

random-vector-and-covariance

Random vectors, covariance and correlation matrices

Definition (prof’s framing)

Notation & setup

Formula(s) to know cold

Insights & mental models

Where this plumbing leads

Pitfalls

Scope vs ISLP

Exercise instances

How it might appear on the exam

Related

Graph View

Table of Contents

Backlinks