Module 07: Moving Beyond Linearity — Book delta

ISLP ch. 7 covers the conceptual sweep of module 7 well (basis-functions framing, piecewise polynomials, regression splines, smoothing splines, LOESS, GAMs). The deltas are mostly explicit formulas and design-matrix templates that the prof wrote on the board / on the slides but ISLP either skips, hides in a footnote, or states only in the abstract form. The Exercise-7.3 and Exercise-7.4 hand-construction artifacts are the load-bearing ones: ISLP does not give the natural-spline basis formula in usable form, and does not write out an additive-model design matrix block-by-block.

What’s deliberately out: the Reinsch-matrix / optional smoother-matrix algebra (slide §“Computing ” + Exercise 7.6) — the prof bracketed this off as optional and the natural-spline boundary-knot derivation as “the book doesn’t, so I won’t either” L16. Both are listed in the MOC’s ## Out of scope block.


1. Basis-function design matrix in OLS form

L16, basis-functions, slide deck §“Basis Functions”

ISLP §7.3 states the basis-function model

and remarks that “all of the inference tools for linear models … are available in this setting.” It does not write out the design matrix or the OLS estimator with the basis columns plugged in. The prof’s slide does, and the entire module rests on it.

The design matrix is

an matrix with basis-function columns plus the intercept column. The OLS estimator is unchanged:

and so are all of its consequences (sampling distribution of , , pointwise SEs, F-statistics).

Slogan the prof keeps coming back to: “It’s nonlinear, but linear. It’s linear in the parameters , but it’s nonlinear in what you get.” L16

Worked numerical example (slides only)

Take , , data , . Then

This is the lecture-time concrete “build the design matrix, run OLS, done” demo. It’s the template the prof reuses for every method in module 7.


2. General-order spline basis (truncated-power form)

L16, regression-splines, slide deck §“Regression Splines”

ISLP §7.4.3 gives only the cubic () version of the truncated-power basis (Eq. 7.10). The slide deck states the general degree- spline version, which the prof uses to derive both the cubic and the natural-cubic versions in one move.

A spline of order is a piecewise polynomial of degree joined at knots , with continuous derivatives up to order at each knot. The truncated-power basis is

The standard basis is

There are basis functions plus an intercept, so parameters total.

(order)Degree Basis# basis funcsTotal params (incl. intercept)
21 (linear spline),
32 (quadratic spline),
43 (cubic spline),

ISLP gives only the row. The cubic case (, knots → degrees of freedom incl. intercept, excl.) is the one tested by past exams.


3. Natural cubic spline basis: closed-form formula

L16, regression-splines, slide deck §“Natural Cubic Splines”, Exercise 7.3

This is the single most load-bearing m07 delta. ISLP §7.4 introduces natural splines only conceptually (“linear past the boundary knots”) and confines the parameter accounting to a footnote (§7.4.4 footnote 4) without writing the basis. The prof’s slide gives the explicit basis formula that Exercise 7.3 plugs into. The prof himself flagged that he won’t derive why this enforces linearity past the boundary — “in other courses they go through the math … the book doesn’t, so I won’t either” L16 — so the formula is something to apply, not derive.

Setup

  • interior knots .
  • Two boundary knots and , conventionally set to and . They add constraints (second derivative past the boundary), not basis columns.
  • Two boundary constraints kill two truncated-cubic basis functions, so a natural cubic spline with interior knots has basis functions (plus the intercept) → parameters total.

Basis formula

with the helper

So the column count is: one "" column + columns of the form = non-intercept columns.

Worked instance: one interior knot at 2006 (Exercise 7.3)

year with one interior knot and boundary knots , (the min and max of year in the Wage data). Here , so the only index is and we need two basis functions:

Because is the upper boundary knot and the data satisfies , the term for every observation. So on the data range the basis simplifies to

(The leading also loses its because already.) The design matrix is then

three columns, two non-intercept basis functions → degrees of freedom = 2 (excl. intercept) = .

This is the canonical Exercise-7.3 / “construct the natural-cubic-spline design matrix by hand” object the prof signaled is fair game.


4. Local regression: full slide-form objective and the tricube kernel

L16, local-regression, slide deck §“Local Regression”

ISLP Algorithm 7.1 / Eq. 7.14 gives only the local-linear weighted objective and is deliberately silent on the kernel formula (“we will avoid getting into the technical details … there are books written on the topic”). The slide deck gives the local-quadratic objective and writes out the tricube kernel that R’s loess() actually uses. Both extras are lookup-able and unique to the slides.

Local-quadratic objective

At a target point , find minimising

and predict . (Local-linear drops the term and matches ISLP Eq. 7.14.)

Tricube kernel (R loess() default)

Let denote the -th nearest neighbour of , where is set by the span . The tricube weight is

Key properties:

  • at (max weight at the target).
  • at (everything outside the neighbourhood gets weight zero, exactly).
  • Smooth (twice-differentiable) decay in between — the smooth replacement for the hard rectangular cutoff of KNN.
  • Normalised so that the boundary point of the neighbourhood gets weight 0 and the closest neighbour gets weight 1 — span is the only hyperparameter that matters.

This is the formula that lets the prof’s “smooth KNN” framing be made precise. ISLP doesn’t give it; books on local regression do.


5. GAM as a block-structured design matrix (additive-OLS regime)

L16, L17, generalized-additive-models, slide deck §“Additive Models”, Exercise 7.4

ISLP §7.7.1 says GAMs with basis-function components are fit “as a big regression onto spline basis variables and dummy variables, all packed into one big regression matrix.” It then shows Figure 7.11 and moves on. It does not write the design matrix. The slide deck does. The construction is the object Exercise 7.4 asks you to build by hand.

The wage-GAM template (slides, Exercise 7.4)

Model:

with a cubic spline in age with knots at 40 and 60, a natural cubic spline in year with one interior knot at 2006, and a dummy-coded factor in education with 5 levels (< HS Grad as baseline).

Per-predictor blocks

(cubic spline in age, 5 columns from truncated-power basis):

(natural cubic spline in year, 2 columns from the simplified basis in §3 above):

(dummy-coded education, 4 columns for 5 levels with <HSG as reference):

The full GAM design

Stack horizontally, intercept up front:

Total column count = . Fit by OLS:

This is what gam(wage ~ bs(age, knots=c(40,60)) + ns(year, knots=2006) + education) does internally when all components are basis-function (no s() smoothing-spline or lo() LOESS terms — those require backfitting, which is out of scope as a derivation).

Column-space invariance (the Exercise-7.4 punchline)

This is the conceptual delta that ISLP doesn’t make explicit. The hand-built (truncated-power basis) and R’s gam()-built design matrix (B-spline basis under the hood) are not equal column-by-column, yet they produce the same fitted values:

The mechanism: both matrices span the same column space (the space of natural cubic splines with the given knots ⊕ the dummy-encoded education space). The OLS projection depends only on this column space, not on the particular basis used to write it. The individual change with the basis; does not.

This is what Exercise 7.4’s punchline question — “How can myhat equal yhat when the design matrices differ?” — is fishing for, and the kind of “method-comparison” question the prof flagged as exam-likely.


6. Degrees-of-freedom cheat-sheet (consolidated)

L16, L17, regression-splines, smoothing-splines, step-functions, polynomial-regression

ISLP scatters the dof counts across §7.1, §7.2, §7.4 (main text + footnote 4), and §7.5.2. The prof drilled them together because the 2024 Q2c / 2025 Q4e(i) exam patterns turn on precise counting (the 2025 paper was deliberately ambiguous about whether the intercept counted, so reading the question carefully is the trap). Worth having in one place.

For each method below, “params” = total number of fitted real numbers; “df incl. intercept” / “df excl. intercept” splits out the convention.

MethodParametersdf (incl. intercept)df (excl. intercept)
Polynomial degree
Step function with cutpoints
Linear spline, knots
Cubic spline, knots
Natural cubic spline, interior knots
Smoothing spline nominal effectivesame (non-integer)

The cubic-spline cell uses the slide-deck count from §2 above (). The natural-cubic-spline cell follows from §3 ( non-intercept columns plus the intercept). The smoothing-spline cell is the effective df — bounded between 2 (when , is the OLS line) and (when , interpolates every ); see ISLP §7.5.2 for the underlying formula.

The intercept-counting trap, restated

The 2024 paper called a natural cubic spline with 3 cut points “4 dof” — that’s the excl.-intercept convention applied to a model with interior knots → . The 2025 paper called bs(age, knots = quantile(age, c(0.2, 0.4, 0.6, 0.8))) (a plain cubic spline with knots) “7 dof” — that’s also excl. intercept: . In both cases the intercept was already supplied by the surrounding gam(...) model. When in doubt: ask whether the question already gives you an intercept; if yes, subtract one from the incl.-intercept count.


7. Smoothing-spline LOOCV shortcut: parallel to the OLS hat-matrix shortcut

L16, smoothing-splines, slide deck §“The smoother matrix”

ISLP §7.5.2 gives the LOOCV shortcut for smoothing splines:

and remarks (footnote 5) that “we have a very similar formula (5.2) in Chapter 5 for least squares linear regression.” It stops there. The slide deck and the prof make the structural parallel explicit, and it generalizes to “any linear smoother.”

A linear smoother is any method whose fitted values are linear in :

for some smoother matrix . The members of this family in the course:

MethodSmoother matrix
OLS (incl. basis-function OLS: polynomial, step, regression spline)Hat matrix
Ridge regression
Smoothing spline from the curvature-penalty objective
LOESS (linear or quadratic local fit)A weight-dependent (R reports trace.hat)

For any linear smoother, the LOOCV shortcut takes the same form:

with replacing the OLS hat-matrix diagonal (ISLP §5.1.2 Eq. 5.2). The structural fact “linear smoother ⇒ LOOCV is essentially free” is what the prof keeps pointing at: it’s why one fit is enough to do LOOCV for OLS, ridge, and smoothing splines, and it’s also why the effective dof = trace of the smoother definition is natural — both quantities are properties of the same .

This linear-smoother frame, with and the unified LOOCV formula, is the cross-cutting object that ties together module 3 (OLS), module 6 (ridge), and module 7 (smoothing splines, LOESS). ISLP discusses each piece in isolation; the prof’s slide deck merges them.


Notation / terminology drift

A few places where the prof’s notation diverges from ISLP’s. Worth noting only because they show up in the worked formulas above.

  • Boundary knots. ISLP §7.4 calls them “boundary knots” but introduces them only verbally and does not separate them from interior knots in any formula. The prof uses and explicitly (or sometimes ”, ”) in the natural-cubic-spline basis above. They are min and max of by default; they impose constraints but do not add basis columns.
  • Spline order vs degree. The prof’s slides use = “order” with = “degree” (so ⇒ cubic spline). ISLP uses only “degree” (). Order is the standard convention from approximation theory; degree is more common in textbook statistics.
  • Truncated-power notation. ISLP writes with a single knot . The prof writes with for the -th knot. Both mean the same thing.
  • Knot variable. ISLP §7.4 uses . The prof uses . (And shows up nowhere else in the course.)
  • “GAM” vs “AM”. The prof’s slides distinguish: an additive model (AM) is the Gaussian-response form ; a generalized additive model (GAM) is the same idea with a GLM link (logit, etc.). ISLP uses “GAM” for both. This course only ever uses GAMs with the Gaussian (identity) and logit links; the broader GLM-link generality is name-checked only.
  • bs vs BS. The R interface labels cubic-spline columns bs (“B-spline”). The prof: “I don’t know why they call it BS.” L16 B-spline columns and truncated-power columns span the same column space (different basis, same fit), so the relabeling is cosmetic. The B-spline algorithmic construction itself is out of scope.
  • Smoother matrix symbol. ISLP uses (with the subscript explicit). The slide deck drops the subscript and writes . Same object.
  • “df” overloading. ISLP uses “degrees of freedom” for three different things — number of fitted parameters (cubic-spline = ), the df= argument to bs() / ns() / s() in R (which is the parameter count excluding the intercept, i.e. the argument is what R will add to the supplied formula), and the effective df for smoothing splines. The prof keeps the same three uses but doesn’t always flag the convention switch. See §6 above for the consolidated table.