Quadratic discriminant analysis (QDA)

The prof’s framing: “LDA but you stop pooling the covariance.” Each class gets its own → the term no longer cancels across classes → discriminant becomes quadratic in → boundaries are curves. More flexible, more parameters, more variance, exactly the bias-variance trade-off Anders has seen since module 2. The “where does the quadratic come from?” question is exam-flagged.

Definition (prof’s framing)

“QDA = LDA without pooling. Sigma becomes class-specific → the term no longer cancels across → discriminant is quadratic in → boundaries are curves, not lines.” - L09-classif-3

“Drop the pooled-variance assumption. Each class gets its own covariance matrix . Everything else identical: still multivariate Gaussian for , still estimate from frequencies, still apply Bayes.” - L09-classif-3

The model assumption: , with class-specific.

Notation & setup

  • Same , , as LDA.
  • : class--specific covariance matrix (no pooling).
  • Estimator: , the within-class sample covariance.

Formula(s) to know cold

QDA discriminant score (keep the term, it now depends on ):

Equivalent compact form:

Classify by . Decision boundaries, set , are quadratic in (conic sections).

Where the quadratic comes from (the exam-flagged derivation)

In LDA, the piece has no (since is shared) → cancels in → drops. Discriminant linear.

In QDA, becomes → that piece does depend on → cannot drop → survives as → discriminant quadratic in . Plus an extra from the Gaussian normalizing constant (which previously canceled because was constant in ).

“It’s an interesting point that simply by making the sigma -dependent, we introduced a new term, and that term is quadratic. So now the decision boundary is no longer going to be a line, it’s going to be quadratic.” - L09-classif-3

Insights & mental models

  • Bias-variance trade-off in pure form. QDA has lower bias (no pooling assumption) but higher variance (more parameters). Same algebra as linear-vs-polynomial regression: more flexibility → more parameters → more variance.
  • Parameter count is the headline number:
    • LDA: one shared parameters.
    • QDA: class-specific ‘s → parameters.
    • For , : LDA needs ~5,050; QDA needs ~25,250. - L09-classif-3
  • When QDA pays off: large , classes genuinely have different covariance structure, you can afford the extra parameters.
  • When LDA wins: small , variance dominates and the simpler pooled model generalizes better. The iris example below makes this tangible.
  • Curves not lines. Decision boundaries are conic sections (parabolas, hyperbolas, ellipses). For a single class with much “fatter” than , the boundary curls around the fat class.

Worked iris example (the prof’s case for LDA)

Two predictors (sepal length × width), three iris species. Random 50/50 train/test split:

MethodTrain errorTest error
LDA0.190.17
QDA0.170.32

QDA fits training slightly better (it must, strictly more flexible) but doubles test error. Classic overfit.

“If this was my situation… I would go with LDA. The argument would be it’s doing better on held-out data.” - L09-classif-3

(Foreshadows cross-validation, would be more robust to compare with multiple splits.)

Exam signals

“That’s another good exam question, where does the quadratic come from in QDA? Or show that, yeah… it’s an interesting point that simply by making the sigma -dependent, we introduced a new term, and that term is quadratic.” - L09-classif-3

“Why ever use LDA over QDA? Same reason linear regression often beats quadratic, when you don’t have enough data to estimate the extra parameters reliably, the simpler model wins on test error.” - L09-classif-3

“QDA: Gaussian holds but unequal. Need enough data.” - L09-classif-3

Pitfalls

  • Forgetting the term. It survives in QDA (didn’t survive in LDA because was constant in ).
  • Treating QDA as always better than LDA. Strictly more flexible ≠ better, bias-variance argument.
  • Computing parameter count without the symmetry constraint. A covariance has free parameters (not , symmetric matrix, only upper triangle is free).
  • Estimating with too few samples per class. When , is singular and the classifier breaks. (Naive Bayes’ diagonal- assumption is the standard rescue when is large.)
  • Reading too much into “QDA is more flexible.” Slide deck: “if the covariance matrices in theory are equal, will they not be estimated equal? Should we not always prefer QDA to LDA?” Answer: no, because of variance.

Scope vs ISLP

  • In scope: QDA model assumptions, formula, where the quadratic comes from, parameter-count comparison vs LDA, LDA-vs-QDA bias-variance trade-off.
  • Look up in ISLP: §4.4.3, pp. 152–155. Equation (4.28) is the canonical .
  • Skip in ISLP: Detailed simulation scenarios in §4.5.2 (where QDA wins / loses), useful for intuition but not exam-relevant beyond “QDA wins when truly differ and is adequate.”

Exercise instances

  • Exercise4.2d: derive QDA classification rule from the multivariate Gaussian; classify the bank note (length 214, diagonal 140.4) using QDA; compare to LDA result. Emphasizes the explicit + algebra.
  • Exercise4.6f: qda(Direction ~ Lag2) on the Weekly data; held-out confusion matrix.
  • CE1 problem 3g: perform QDA in R on the tennis data; confusion matrix; sensitivity/specificity on the test set.
  • CE1 problem 3h: compare LDA, QDA, and logistic decision boundaries; discuss which to prefer based on the confusion-matrix results.

How it might appear on the exam

  • “Where does the quadratic come from?”: flagged exam question. Walk through the algebra: no longer cancels when depends on .
  • Parameter-count MCQ: “For , , how many free parameters does QDA estimate just for the covariance matrices?” → .
  • LDA-vs-QDA T/F: “QDA always has lower test error than LDA” → false (variance can dominate). “QDA always has lower training error than LDA” → true on average (strictly more flexible).
  • Method-choice justification: Given a confusion matrix or test-MSE table, pick LDA or QDA and justify with a bias-variance argument.
  • Hand-classification: Given , , , plug a new into both ‘s and pick the larger.
  • Output interpretation: Compare LDA-decision-boundary plot (line) vs QDA-decision-boundary plot (curve) for the same data; explain.