1
a. Show that the least square estimator of a multiple linear regression model is given by
b. Show that the maximum likelihood estimator is equal to the least square estimator for the multiple linear regression model.
2
Write Python code to create a similar representation of the Credit data set in the ISLP package, as in the figure shown below.
{width=50%}
3
a. For the Credit dataset, pick the best model using Best Subset Selection according to , and Adjusted .
+ Hint: ISLP does not ship a single regsubsets-style function. Implement best subset selection manually using itertools.combinations over the columns of the design matrix and sklearn.linear_model.LinearRegression (or statsmodels.OLS if you want //Adjusted for free).
b. For the Credit dataset, pick the best model using Best Subset Selection according to a -fold CV.
+ Hint: Reuse the per-size best subsets from the previous step and write your own CV loop using sklearn.model_selection.KFold.
c. Compare the result obtained in Step 1 and Step 2.
4
a. Select the best model for the Credit Data using Forward and Backward Stepwise Selection.
+ Hint: Use Stepwise from ISLP.models together with sklearn_selected (see ISLP ch.6 lab).
b. Compare with the results obtained with Best Subset Selection.
5
a. Apply Ridge regression to the Credit dataset. b. Compare the results with the standard linear regression.
6
a. Apply Lasso regression to the Credit dataset. b. Compare the results with the standard linear regression and the Ridge regression.
7
How many principal components should we use for the Credit dataset? Justify.
8
Apply PCR on the Credit dataset and compare the results with the previous methods used in this module.
9
Apply PLS on the Credit dataset and compare the results with the previous methods used in this module.
Acknowledgements
This document was originally adapted from the R-based recommended exercises by Sara Martino, Stefanie Muff and Kenneth Aase (Department of Mathematical Sciences, NTNU).