← Back to wiki

Compulsory Exercise 3 — MC / SC review

11 questions · 19 points · ~25 min · CE3 (V2020)

Multiple- and single-choice questions lifted verbatim from Compulsory Exercise 3 (TMA4268, V2020). R-coding parts are excluded. Click an option to lock; explanations open automatically. Score tracker bottom-left. Solutions adapted from the official solution PDF.

Question 1 2 points CE3 P2b

Answer the following multiple choice questions by using the Covid-19 data to model the probability of deceased as a function of sex, age and country (with France as reference level; no interactions). The fitted glm() output is shown below.

##                   Estimate Std. Error z value Pr(>|z|)
## (Intercept)      -7.633051   0.897063  -8.509  < 2e-16 ***
## sexmale           1.137246   0.343706   3.309 0.000937 ***
## age               0.068012   0.009846   6.907 4.94e-12 ***
## countryindonesia -0.754259   0.815127  -0.925 0.354796
## countryjapan     -2.434101   0.667826  -3.645 0.000268 ***
## countryKorea     -1.366797   0.374837  -3.646 0.000266 ***

Which of the following statements are true, which false?

Show answer

Solution: FALSE — FALSE — FALSE — FALSE.

  1. False — an analysis of deviance test on the full model gives $p \approx 0.0002$ for country, so country is a relevant variable. A non-significant single dummy (Indonesia vs France) doesn't mean the whole factor is irrelevant.
  2. False — a large $p$-value for one dummy level only means we have no evidence that that level differs from the reference; it is not a justification to drop a subpopulation from a model. You don't remove observations because their group is "not different from reference".
  3. False — the calculation $\exp(10 \cdot \hat\beta_{age}) = \exp(10 \cdot 0.068) \approx 1.97$ is correct, but this is the multiplicative change in the odds, not in the odds ratio.
  4. False — $\exp(\hat\beta_{sex}) = \exp(1.137) \approx 3.12$ is the odds ratio (males vs females), not the probability ratio.

Atoms: logistic-regression, odds-and-odds-ratio.

Question 2 2 points CE3 P2f

Which of the following statements are true, which false?

Consider the classification tree below to answer:

Classification tree on age, country, sex predicting deceased.

Consider the LDA code and output below:

library(MASS)
table(predict = predict(lda(deceased ~ age + sex + country, data = d.corona))$class,
    true = d.corona$deceased)

##        true
## predict     0    1
##       0  1926   31
##       1    39   14
Show answer

Solution: TRUE — TRUE — TRUE — FALSE.

Note from the official solution: statements (iii) and (iv) were later found to be ambiguous, so both True and False were graded as correct. Below is the most defensible reading.

  1. True — follow the tree: age < 79.5 is False (age > 91), then country: indonesia,japan,Korea is False (French), then age < 91 is False. The terminal leaf reads 0.461500 ≈ 0.46.
  2. Trueage appears at the root and again deep in the tree, while sex appears only once (and as a tie-breaker in a single subtree). The classifier leans heavily on age.
  3. True (most-defensible reading) — the null classifier "always predict 0 (alive)" misclassifies exactly the 45/2010 ≈ 2.24% who actually died. A useful classifier should beat this; LDA here gets $(39+31)/2010 \approx 3.48\%$, which is worse than the null. Officially graded ambiguous.
  4. False — LDA does estimate (posterior) probabilities via Bayes' rule; that part of the statement is wrong. The misclassification claim is defensible (3.48% > 2.24% null), but the reason about probabilities is incorrect. Officially graded ambiguous.

Atoms: classification-tree, lda, confusion-matrix.

Question 3 2 points CE3 P4b

Inference vs prediction: Which of the following methods are suitable when the aim of your analysis is inference?

Show answer

Solution: TRUE — TRUE — TRUE — FALSE.

  1. True — lasso and ridge yield interpretable coefficients on the original predictors; lasso additionally selects variables. Both are routinely used for inference, though SE/$p$-values require post-selection care.
  2. True — coefficients of an MLR (including interaction terms) have direct interpretations and standard inferential tooling ($t$-tests, CIs, $F$-tests).
  3. True — coefficients are log-odds with interpretable signs/magnitudes and standard Wald-type inference.
  4. False — SVM is a black-box geometric classifier; it does not produce interpretable parameters and is not used for inference about effects.

Atoms: inference-vs-prediction, lasso, ridge-regression.

Question 4 2 points CE3 P4c

We again look at the Covid-19 dataset from Problem 2 to study some properties of the bootstrap method. Below we estimated the standard errors of the regression coefficients in the logistic regression model with sex, age and country as predictors using 1000 bootstrap iterations (column std.error). These standard errors can be compared to those that we obtain by fitting a single logistic regression model using the glm() function. Look at the R output below and compare the standard errors that we obtain from these two approaches (note that the t1* to t6* variables are sorted in the same way as for the glm() output).

library(boot)
boot.fn <- function(data, index) {
    return(coefficients(glm(deceased ~ sex + age + country, family = "binomial",
        data = data, subset = index)))
}
boot(d.corona, boot.fn, 1000)

##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
## Call:
## boot(data = d.corona, statistic = boot.fn, R = 1000)
##
## Bootstrap Statistics :
##        original       bias    std. error
## t1* -7.63305130 -0.1529721699 0.783528214
## t2*  1.13724644  0.0387847701 0.376067951
## t3*  0.06801169  0.0009371457 0.008496607
## t4* -0.75425940 -1.8680017229 5.127173438
## t5* -2.43410057 -0.6257843968 2.979530357
## t6* -1.36679680  0.0126076844 0.381765945

# Logistic regression
r.glm <- glm(deceased ~ sex + age + country, d.corona, family = "binomial")
summary(r.glm)$coef

##                     Estimate Std. Error   z value     Pr(>|z|)
## (Intercept)      -7.63305130 0.897063042 -8.5089352 1.755379e-17
## sexmale           1.13724644 0.343705727  3.3087794 9.370363e-04
## age               0.06801169 0.009846377  6.9072806 4.940322e-12
## countryindonesia -0.75425940 0.815127165 -0.9253273 3.547957e-01
## countryjapan     -2.43410057 0.667826265 -3.6448111 2.675883e-04
## countryKorea     -1.36679680 0.374836917 -3.6463772 2.659635e-04

Which of the following statements are true?

Show answer

Solution: FALSE — TRUE — TRUE — FALSE.

  1. False — bootstrap and glm SEs can differ; the bootstrap is treated as the more honest estimator (fewer parametric assumptions). A large gap is a signal about the parametric assumptions, not a problem with the bootstrap itself.
  2. True — the glm SEs come from the asymptotic Wald approximation under specific distributional assumptions; when bootstrap SEs differ markedly (especially for countryindonesia and countryjapan, where bootstrap SE is many times larger), the Wald assumptions are suspect for those coefficients (small subgroup sizes, separation, etc.).
  3. True — since the glm SEs for the Indonesia and Japan dummies are far smaller than the bootstrap SEs, the Wald $z = \hat\beta / \widehat{\text{SE}}$ is inflated and the resulting $p$-values are too small.
  4. False — the bootstrap samples with replacement. "Without replacement" of the full $n$ would just return the original dataset.

Atoms: bootstrap, logistic-regression.

Question 5 2 points CE3 P5a

Which of the following are techniques for regularization?

Show answer

Solution: TRUE — TRUE — FALSE — TRUE.

  1. True — lasso adds an $\ell_1$ penalty $\lambda\sum |\beta_j|$ that shrinks and can zero out coefficients.
  2. True — ridge adds an $\ell_2$ penalty $\lambda\sum \beta_j^2$, shrinking coefficients toward zero.
  3. False — forward/backward selection is a subset selection (variable-selection) technique, not a continuous shrinkage / regularization method.
  4. True — SGD has an implicit regularization effect (especially with early stopping); in the neural-network context it is counted among regularization techniques the course covers.

Atoms: regularization, lasso, ridge-regression, subset-selection.

Question 6 2 points CE3 P5b

Which of the following statements about principal component regression (PCR) and partial least squares (PLS) are correct?

Show answer

Solution: FALSE — TRUE — FALSE — TRUE.

  1. False — PCR is unsupervised: it picks components by variance in $X$, ignoring the response.
  2. True — PLS is the supervised version: components are chosen for high correlation with $y$.
  3. False — that description belongs to PCR, not PLS.
  4. True — PCR's components maximise variance among the covariates.

Atoms: pcr, pls, pca.

Question 7 1 point CE3 P5c

In ridge regression, we estimate the regression coefficients in a linear regression model by minimizing $$\sum_{i=1}^{n}\left(y_i - \beta_0 - \sum_{j=1}^{p}\beta_j x_{ij}\right)^2 + \lambda\sum_{j=1}^{p}\beta_j^2.$$ What happens when we increase $\lambda$ from 0? Choose the single correct statement:

Show answer
Correct answer: D

D — at $\lambda = 0$ ridge is OLS (unbiased). As $\lambda$ grows the coefficients are shrunk toward zero, so bias grows monotonically; in the limit $\lambda \to \infty$ all $\hat\beta_j \to 0$ and bias is maximal.

A — training RSS increases with $\lambda$ (we're moving away from the OLS minimum of training RSS).

B and C — test RSS is U-shaped in $\lambda$: it typically decreases first, hits a minimum, then increases. Neither "steadily decrease" nor "steadily increase" is correct.

E — variance moves the opposite way: shrinkage reduces variance as $\lambda$ grows. That's exactly the bias–variance trade-off ridge exploits.

Atoms: ridge-regression, bias-variance-tradeoff.

Question 8 1 point CE3 P5d

Which statement about the curse of dimensionality is correct?

Show answer
Correct answer: B

B — in high dimensions, "nearest" neighbours are no longer near in any meaningful sense (points become roughly equidistant). KNN, which depends on local structure, degrades as $p$ grows.

A describes the bias–variance trade-off of KNN itself, not the curse of dimensionality.

C — the curse is a general high-dimensional phenomenon; it is not a property of $K$-means specifically.

D — wrong characterisation of radial kernels; radial-kernel SVMs are not generally avoided in high dimensions, and the reasoning given is not the curse.

E — a normative oversimplification: many covariates can be fine if local structure isn't required (linear models, trees with regularisation, etc.).

Atoms: curse-of-dimensionality, knn.

Question 9 1 point CE3 P5e

Now assume you have 10 covariates, $X_1$ to $X_{10}$, each of them uniformly distributed in the interval $[0, 1]$. To predict a new test observation $(X_1^{(0)}, \dots, X_{10}^{(0)})$ in a $K$-nearest neighbor (KNN) clustering approach, we use all observations within 20% of the range closest to each of the covariates (that is, in each dimension). Which proportion of available (training) observations can you expect to use for prediction?

Show answer
Correct answer: A

A — each covariate contributes an independent factor of $0.2$, so the hypercube fraction is $0.2^{10} = 1.024 \cdot 10^{-7}$. This is the canonical illustration of the curse of dimensionality.

B — $0.2^4 \approx 1.6\cdot 10^{-3}$; forgets six of the ten dimensions.

C — the per-dimension fraction (0.20). Ignores the product across dimensions entirely.

D — $0.2^2 = 0.04$; treats only two dimensions.

E — $0.1^{10}$; wrong base (uses 10% rather than 20%).

Atoms: curse-of-dimensionality, knn.

Question 10 2 points CE3 P5f

This example is taken from a real clinical study by Ikeda, Matsunaga, Irabu, et al. Using vital signs to diagnose impaired consciousness: cross sectional observational study. BMJ 2002;325:800. Researchers investigated the use of vital signs as a screening test to identify brain lesions in patients with impaired consciousness. The setting was an emergency department in Japan. The study included 529 consecutive patients that arrived with consciousness. Patients were followed until discharge. The vital signs of systolic and diastolic blood pressure and pulse rate were recorded on arrival. The aim of this study was to find a quick test for assessing whether the newly arrived patient suffered from a brain lesion. While vital signs can be measured immediately, the actual diagnosis of a brain lesion can only be determined on the basis of brain imaging and neurological examination at a later stage, thus the quick measurements of blood pressure and heart rate are important to make a quick assessment. In total, 312 patients (59%) were diagnosed with a brain lesion.

The performance of each vital sign (systolic blood pressure, diastolic blood pressure and heart rate) was separately evaluated as a screening test to quickly diagnose brain lesions. To assess the quality of each of these vital signs, different thresholds were taken successively to discriminate between "negative" and "positive" screening test result. For each vital sign and each threshold the sensitivity and specificity were derived and used to plot a receiver operating characteristic (ROC) curve for the vital sign (Figure 1):

ROC curves for systolic blood pressure, diastolic blood pressure, and pulse rate.
Figure 1: Figure for problem 5f); taken from P. Sedgwick, BMJ 2011;343.

Which of the following statements are true?

Show answer

Solution: TRUE — TRUE — FALSE — TRUE.

  1. True — specificity = $P(\text{neg test} \mid \text{no lesion})$, so $1 - \text{specificity}$ is the false-positive rate, i.e. the proportion of patients without a lesion that the test calls positive.
  2. True — moving the threshold along the ROC curve trades sensitivity for specificity. That trade-off is exactly what the curve traces out.
  3. False — a perfect test has AUC = 1 (the curve hugs the top-left corner). AUC = 0.5 corresponds to a useless test (the diagonal).
  4. True — the systolic-blood-pressure curve sits highest and farthest from the diagonal in the figure, so it has the largest AUC and is the best discriminator.

Atoms: roc-curve, sensitivity-specificity, auc.

Question 11 2 points CE3 P5g

We study the decathlon2 dataset from the factoextra package in R, where Athletes' performance during a sporting meeting was recorded. We look at 23 athletes and the results from the 10 disciplines in two competitions. Some rows of the dataset are displayed here:

decathlon2.active[c(1, 3, 4), ]

##          100m long_jump shot_put high_jump  400m 110.hurdle discus
## SEBRLE  11.04      7.58    14.83      2.07 49.81      14.69  43.75
## BERNARD 11.02      7.23    14.25      1.92 48.93      14.99  40.87
## YURKOV  11.34      7.09    15.19      2.10 50.42      15.31  46.26
##         pole_vault javeline 1500m
## SEBRLE        5.02    63.19 291.7
## BERNARD       5.32    62.77 280.1
## YURKOV        4.72    63.44 276.4

From a principal component analysis we obtain the biplot given in Figure 2.

PCA biplot of the decathlon2 dataset.
Figure 2: Figure for question 5g).

Which of the following statements are true, which false?

Show answer

Solution: FALSE — TRUE — TRUE — TRUE.

  1. False — CLAY sits at a negative PC2 value, and 1500m's PC2 loading is also negative. Two negatives align, so CLAY's projected 1500m time is large — i.e. CLAY runs slow, not fast.
  2. True — 100m and long_jump arrows lie along the same PC1 axis (in opposite directions, because low 100m times go with high long-jump distances). Athletes scoring high on this axis tend to excel at both.
  3. True — in absolute value, the PC1 loadings for 100m and long_jump are the largest in the biplot.
  4. True — the 110.hurdle arrow is almost parallel to PC1, so its PC2 component is close to zero.

Atoms: pca, biplot, principal-components.