Step functions

The “stupid but actually pretty common” basis-function instance: cut the predictor range into $K + 1$ bins, fit one constant per bin. Discontinuous, no derivatives, but cheap, robust, and the right tool when you genuinely don’t have many constraints to push around.

Definition (prof’s framing)

“I mean, this one is very stupid, but it’s actually quite common, simply because you don’t need that much information… so you don’t have too many constraints that push things around. Even the step functions are actually pretty nice, even if they are a bit stupid looking.” - L16-beyondlinear-1

Pick cutpoints $c_{1} < c_{2} < \dots < c_{K}$ in the range of $X$ . Define indicator basis functions $b_{j} (x) = 1 (c_{j - 1} \leq x < c_{j}), j = 1, \dots, K - 1, b_{K} (x) = 1 (c_{K} \leq x) .$

Fit by ordinary least squares with these as columns of $X$ , same OLS machinery as linear regression, just with a binned design matrix. The result is a piecewise-constant fit: one mean per bin.

Notation & setup

$K$ cutpoints $\Rightarrow K + 1$ bins $\Rightarrow K$ dummy basis columns + intercept = $K + 1$ parameters total.
The first bin is absorbed into the intercept (reference category, just like dummy coding for factors in module 3).
In R: cut(age, K) produces the binned design matrix; pass it to lm().

Mathematical structure

Design matrix:

X = 1 ⋮ 1 1 (c_{1} \leq x_{1} < c_{2}) ⋮ 1 (c_{1} \leq x_{n} < c_{2}) \dots \dots 1 (c_{K} \leq x_{1}) ⋮ 1 (c_{K} \leq x_{n}) .

Each row has a 1 in the intercept column and a 1 in exactly one bin column.

Fitted prediction: $\hat{β}_{0}$ is the mean response in the first bin (where $X < c_{1}$ ); $\hat{β}_{0} + \hat{β}_{j}$ is the mean in bin $j$ relative to the reference. Confidence intervals come for free, it’s just OLS:

“Confidence intervals come the same as you would get the confidence intervals for the prediction in the linear model, because it’s just a linear model only now on these basis functions, which in that case were just fixed intervals.” - L16-beyondlinear-1

Insights & mental models

Step functions = “regression on a factor variable.” When the original $X$ is already a factor (e.g. education with levels <HS, HS, Some College, College, Advanced Degree), lm(wage ~ education) is a step function for free, no cut() needed. The wage-vs-education demo in lecture is exactly this case.
No derivatives. The fit is piecewise constant, which means jumps at every cutpoint:

“You don’t have derivatives here. They’re not even… it’s piecewise constant, but it’s not connected, it can jump.” - L16-beyondlinear-1
Cutpoint choice is manual. R defaults to equally spaced bins (or even quantile-based via breaks=), but you can hand-pick the breaks. The book notes that 5-year age groups are routine in biostatistics / epidemiology.
Why “a bit stupid”: between the cutpoints the model is forced to ignore variation. The first bin in the wage-vs-age demo “clearly misses the increasing trend” (book §7.2 / Figure 7.2 commentary).

Exam signals

“Even the step functions are actually pretty nice, even if they are a bit stupid looking.” - L16-beyondlinear-1

Step functions are introduced as a pedagogical bridge from polynomial regression to splines. No prof quote flagging them as exam-likely on their own, but they are an explicit instance of basis-functions, and “what is the design matrix for a step function with cutpoints at…” is a fair-game design-matrix-construction question.

Pitfalls

Choice of cutpoints matters. Too few and you smear over real structure; too many and the step boundaries dominate. Equal-width bins are not equal-count bins.
The first bin is the intercept. Don’t double-count: with $K$ cutpoints you have $K + 1$ bins and $K$ dummy columns, not $K + 1$ .
Step function ≠ ordered factor. When $X$ is genuinely ordinal, a step function throws away the ordering (each level gets its own coefficient, no monotonicity). Sometimes that’s what you want, sometimes not.
The discontinuities are a feature, not a bug, but if you need a smooth fit you should be using a spline instead.

Scope vs ISLP

In scope: definition, the indicator basis, design matrix construction, the wage-vs-education and wage-vs-age examples, the link to dummy-coding factors.
Look up in ISLP: §7.2 (Figure 7.2, wage vs age step-function fit and its logistic counterpart).
Skip in ISLP: nothing specific; the book’s treatment is short and matches the slides.

Exercise instances

None in the recommended-exercise sheet for module 7. Step functions appear implicitly inside Exercise 7.5 (the GAM uses factor(origin), which is a step-function basis), and inside Exercise 7.4 via myfactor() for education.

How it might appear on the exam

“Write the design matrix for a step-function regression of $Y$ on $X$ with cutpoints at $c_{1}, c_{2}$ ”: pure basis-function construction.
“Interpret the coefficient $β_{2}$ in a piecewise-constant fit”: it’s the difference in mean response between bin 2 and the reference bin.
“How many parameters does a step function with $K$ cutpoints have?”: $K + 1$ (including intercept).
T/F: “A step-function fit is continuous.” (False, it has jumps at the cutpoints.)
Method-comparison: when is a step function preferable to a polynomial / spline? When the data has natural breakpoints, when interpretability per bin matters, or when there’s not enough data to constrain a smoother fit.

basis-functions: step functions are the indicator instance.
regression-splines: the smooth alternative when discontinuities at the cutpoints are unacceptable.
categorical-encoding-and-interactions: step functions on a factor variable are dummy-coded categorical regression; the machinery is identical.
generalized-additive-models: step functions slot in as one of the $f_{j}$ choices, especially for qualitative predictors.

statistical.dog

Explorer

step-functions

Step functions

Definition (prof’s framing)

Notation & setup

Mathematical structure

Insights & mental models

Exam signals

Pitfalls

Scope vs ISLP

Exercise instances

How it might appear on the exam

Graph View

Table of Contents

Backlinks

statistical.dog

Explorer

step-functions

Step functions

Definition (prof’s framing)

Notation & setup

Mathematical structure

Insights & mental models

Exam signals

Pitfalls

Scope vs ISLP

Exercise instances

How it might appear on the exam

Related

Graph View

Table of Contents

Backlinks