Random vectors, covariance and correlation matrices
The matrix-algebra plumbing the whole rest of the course rides on. A random vector is a -vector of random variables; its covariance matrix stacks variances on the diagonal and covariances off; its correlation matrix is rescaled by the standard-deviation diagonal so the diagonal is 1. Expectations of linear transformations follow and . Covariance is a linear notion, zero covariance does not mean independence except under joint normality.
Definition (prof’s framing)
“A random vector is a -dimensional vector of random variables.” - L04-statlearn-3
Mean vector: element-wise . Covariance matrix:
”. … if they have a high covariance then they’re varying together; if they have a negative covariance then they’re varying the opposite way; and if it’s just all random then the covariance is going to be small in magnitude.” - L04-statlearn-3
When : it’s the variance . The prof flagged this as a quiz-style fact.
Notation & setup
- : -dimensional random vector.
- : mean vector.
- : covariance matrix. Symmetric and (by construction) positive semi-definite.
- = entry of , the covariance of and . Diagonal entries are variances.
- : correlation. Lives in .
- = diagonal matrix of standard deviations. Then correlation matrix .
Formula(s) to know cold
Definition / shortcut:
Correlation from covariance (the CE1.1f / Exercise 2.3g calculation):
Expectation rules (L04-statlearn-3 proved on board):
Covariance of a linear transformation (used heavily in contrasts and in deriving the sampling-distribution-of-beta):
These are the two formulas the M2 quiz drills (Q4 and Q5 of the random-vectors menti, modules/2StatLearn/2StatLearn.2.md).
Univariate analogues (for sanity checks): , .
Insights & mental models
Element-wise proof of (L04-statlearn-3 on the board): . Take , pull constants out, element-wise it’s the entry of . The prof flagged this as basically obvious but did the proof anyway, it’s the working machinery for everything in M3 onwards.
Covariance is a linear notion, the prof’s most-emphasized framing in L04-statlearn-3:
“We talk about things, I’m kind of always doing this [drawing a line], because we’re sort of assuming a linear line. We’re assuming some kind of linear function. This covariance is really getting at this notion of a slope.” - L04-statlearn-3
So zero covariance does not mean independent in general. It just means “no linear co-variation.” Two variables can be perfectly dependent (e.g. ) but have zero covariance if the dependence is symmetric around zero. The exception is joint normality, zero covariance ⇒ independence, but only there. (See multivariate-normal for that property.)
The correlation matrix is the covariance matrix made unit-free: divide entry by . Diagonal becomes 1; off-diagonals become Pearson correlations in . Lets you compare across different variables / units.
Positive semi-definite is forced by the variance interpretation (modules/2StatLearn/2StatLearn.2.md hint): for any constant vector , . So is positive semi-definite by construction. If it’s singular (det = 0), some linear combination of the ‘s has zero variance, meaning it’s a deterministic function of the others, and the multivariate normal density doesn’t exist (you divide by ).
The cork-deposit example (L04-statlearn-3 running dataset): cork trees, holes drilled (N, E, S, W), measured weight per direction. Variables are very correlated within a tree (sun exposure aside, dense in one direction usually means dense in all). Used as the toy multivariate dataset for the calculation drill, the contrasts example (N − S, E + W, etc.), and the matching exercise.
The calculation by hand (Exercise 2.3g and the CE1.1f single-choice): compute . CE1.1f gives and asks for . Multiple-choice trap: “0.0083” comes from forgetting the square root; “0.15” from ; “0.10” from . Take the square root.
Where this plumbing leads
- multivariate-normal: generalizes the bell curve using in the exponent (and in the normalizer).
- sampling-distribution-of-beta: under Gaussian errors, . The covariance machinery is what gives you the standard errors.
- linear-discriminant-analysis / quadratic-discriminant-analysis, class-conditional densities are multivariate normals with means and covariance (LDA) or (QDA).
- principal-component-analysis: eigen-decomposition of gives the PCs (the prof defers spectral theory to Linear Statistical Models, scope M2 out-of-scope, but the PVE story rides on ‘s eigenvalues).
- collinearity: collinear predictors → near-singular → blows up → SEs explode.
Pitfalls
- Zero covariance ≠ independence in general. Only under joint normality. Easy T/F trap.
- Forgetting the square root in : that’s how the CE1.1f distractors are designed.
- Symmetric proof for , not . Order matters; transpose goes on the right.
- Singular (det = 0) means at least one variable is a perfect linear combination of the others: multivariate normal density doesn’t exist; LDA’s blows up; PCA has zero eigenvalue.
- Be careful with row-vs-column conventions for the data matrix. ISLP uses rows = observations, columns = variables. The prof’s L02 notes the book convention and flags that other books transpose it.
- Pearson correlation is linear correlation only. A perfect quadratic relationship can have .
Scope vs ISLP
- In scope: definition of random vector, mean vector, , , the two expectation rules, , the linear-vs-independence distinction, hand calculation.
- Look up in ISLP: §2.1 (introduction to notation), §3.2.4 (sampling distributions of regression coefficients) and §4.4 (LDA/QDA’s use of ). Hardle / Simar or Johnson & Wichern would be the deeper references; ISLP keeps the matrix algebra light.
- Skip in ISLP: spectral / eigen-decomposition theory of . The prof verbatim (L04-statlearn-3): “we don’t talk about spectral decomposition”, deferred to TMA4267 Linear Statistical Models. Eigenvalues come back as PC variances in M10, but the full spectral machinery is out.
Exercise instances
- Exercise 2.3g: given the covariance matrix of the Auto data’s quantitative columns, compute correlations between
mpganddisplacement/horsepower/weightby hand using ; verify againstcor(Auto[, quant]). The drill that turns the formula into muscle memory. - Exercise 2.4: simulate 1000 draws from a bivariate normal with
mvrnorm()under four settings: (i) , (ii) , (iii) , (iv) . Plot, identify which scatter goes with which . Builds the visual mapping from to point-cloud shape, directly relevant to CE1.1g (contour matching). - CE1 problem 1f: single-choice MC: given , what’s ? Answer: . The square-root-trap MC.
How it might appear on the exam
- MC: correlation from a 2×2 . Direct CE1.1f format.
- T/F: zero covariance ⇒ independence. False in general; true only for joint normality. Classic trap.
- Hand calculation: for a small . Plug-and-chug; the contrasts cork-data exercise is the template.
- Match scatter / contour plot to . The Exercise 2.4 visual: independent vs scaled vs positively-correlated vs negatively-correlated. The multivariate-normal atom owns the contour-matching question (CE1.1g), but the underlying -to-shape intuition lives here.
- Identify which is singular (where ), and what that means for the density / for LDA.
- Quiz-style fact recall. “What is ?” → . From L04-statlearn-3’s flagged quiz fact.
Related
- contrasts: the canonical application of
- multivariate-normal: uses in the density; gives the joint-normal-only result that zero cov ⇒ independence
- linear-regression, sampling-distribution-of-beta, uses this same machinery
- linear-discriminant-analysis / quadratic-discriminant-analysis, class-conditional Gaussians built on
- principal-component-analysis: eigen-decomposition of (mechanics in M10; deferred from M2)
- collinearity: what happens when (or ) is near-singular