Random vectors, covariance and correlation matrices

The matrix-algebra plumbing the whole rest of the course rides on. A random vector is a -vector of random variables; its covariance matrix stacks variances on the diagonal and covariances off; its correlation matrix is rescaled by the standard-deviation diagonal so the diagonal is 1. Expectations of linear transformations follow and . Covariance is a linear notion, zero covariance does not mean independence except under joint normality.

Definition (prof’s framing)

“A random vector is a -dimensional vector of random variables.” - L04-statlearn-3

Mean vector: element-wise . Covariance matrix:

. … if they have a high covariance then they’re varying together; if they have a negative covariance then they’re varying the opposite way; and if it’s just all random then the covariance is going to be small in magnitude.” - L04-statlearn-3

When : it’s the variance . The prof flagged this as a quiz-style fact.

Notation & setup

  • : -dimensional random vector.
  • : mean vector.
  • : covariance matrix. Symmetric and (by construction) positive semi-definite.
  • = entry of , the covariance of and . Diagonal entries are variances.
  • : correlation. Lives in .
  • = diagonal matrix of standard deviations. Then correlation matrix .

Formula(s) to know cold

Definition / shortcut:

Correlation from covariance (the CE1.1f / Exercise 2.3g calculation):

Expectation rules (L04-statlearn-3 proved on board):

Covariance of a linear transformation (used heavily in contrasts and in deriving the sampling-distribution-of-beta):

These are the two formulas the M2 quiz drills (Q4 and Q5 of the random-vectors menti, modules/2StatLearn/2StatLearn.2.md).

Univariate analogues (for sanity checks): , .

Insights & mental models

Element-wise proof of (L04-statlearn-3 on the board): . Take , pull constants out, element-wise it’s the entry of . The prof flagged this as basically obvious but did the proof anyway, it’s the working machinery for everything in M3 onwards.

Covariance is a linear notion, the prof’s most-emphasized framing in L04-statlearn-3:

“We talk about things, I’m kind of always doing this [drawing a line], because we’re sort of assuming a linear line. We’re assuming some kind of linear function. This covariance is really getting at this notion of a slope.” - L04-statlearn-3

So zero covariance does not mean independent in general. It just means “no linear co-variation.” Two variables can be perfectly dependent (e.g. ) but have zero covariance if the dependence is symmetric around zero. The exception is joint normality, zero covariance ⇒ independence, but only there. (See multivariate-normal for that property.)

The correlation matrix is the covariance matrix made unit-free: divide entry by . Diagonal becomes 1; off-diagonals become Pearson correlations in . Lets you compare across different variables / units.

Positive semi-definite is forced by the variance interpretation (modules/2StatLearn/2StatLearn.2.md hint): for any constant vector , . So is positive semi-definite by construction. If it’s singular (det = 0), some linear combination of the ‘s has zero variance, meaning it’s a deterministic function of the others, and the multivariate normal density doesn’t exist (you divide by ).

The cork-deposit example (L04-statlearn-3 running dataset): cork trees, holes drilled (N, E, S, W), measured weight per direction. Variables are very correlated within a tree (sun exposure aside, dense in one direction usually means dense in all). Used as the toy multivariate dataset for the calculation drill, the contrasts example (N − S, E + W, etc.), and the matching exercise.

The calculation by hand (Exercise 2.3g and the CE1.1f single-choice): compute . CE1.1f gives and asks for . Multiple-choice trap: “0.0083” comes from forgetting the square root; “0.15” from ; “0.10” from . Take the square root.

Where this plumbing leads

Pitfalls

  • Zero covariance ≠ independence in general. Only under joint normality. Easy T/F trap.
  • Forgetting the square root in : that’s how the CE1.1f distractors are designed.
  • Symmetric proof for , not . Order matters; transpose goes on the right.
  • Singular (det = 0) means at least one variable is a perfect linear combination of the others: multivariate normal density doesn’t exist; LDA’s blows up; PCA has zero eigenvalue.
  • Be careful with row-vs-column conventions for the data matrix. ISLP uses rows = observations, columns = variables. The prof’s L02 notes the book convention and flags that other books transpose it.
  • Pearson correlation is linear correlation only. A perfect quadratic relationship can have .

Scope vs ISLP

  • In scope: definition of random vector, mean vector, , , the two expectation rules, , the linear-vs-independence distinction, hand calculation.
  • Look up in ISLP: §2.1 (introduction to notation), §3.2.4 (sampling distributions of regression coefficients) and §4.4 (LDA/QDA’s use of ). Hardle / Simar or Johnson & Wichern would be the deeper references; ISLP keeps the matrix algebra light.
  • Skip in ISLP: spectral / eigen-decomposition theory of . The prof verbatim (L04-statlearn-3): “we don’t talk about spectral decomposition”, deferred to TMA4267 Linear Statistical Models. Eigenvalues come back as PC variances in M10, but the full spectral machinery is out.

Exercise instances

  • Exercise 2.3g: given the covariance matrix of the Auto data’s quantitative columns, compute correlations between mpg and displacement / horsepower / weight by hand using ; verify against cor(Auto[, quant]). The drill that turns the formula into muscle memory.
  • Exercise 2.4: simulate 1000 draws from a bivariate normal with mvrnorm() under four settings: (i) , (ii) , (iii) , (iv) . Plot, identify which scatter goes with which . Builds the visual mapping from to point-cloud shape, directly relevant to CE1.1g (contour matching).
  • CE1 problem 1f: single-choice MC: given , what’s ? Answer: . The square-root-trap MC.

How it might appear on the exam

  • MC: correlation from a 2×2 . Direct CE1.1f format.
  • T/F: zero covariance ⇒ independence. False in general; true only for joint normality. Classic trap.
  • Hand calculation: for a small . Plug-and-chug; the contrasts cork-data exercise is the template.
  • Match scatter / contour plot to . The Exercise 2.4 visual: independent vs scaled vs positively-correlated vs negatively-correlated. The multivariate-normal atom owns the contour-matching question (CE1.1g), but the underlying -to-shape intuition lives here.
  • Identify which is singular (where ), and what that means for the density / for LDA.
  • Quiz-style fact recall. “What is ?” → . From L04-statlearn-3’s flagged quiz fact.