Topics
The book covers the following topics: Statistical Learning: Concepts, Statistical Learning: Practical Aspects, Logistic regression, Lasso and Friends, Working with text data, Nearest Neighbors, The Naive Bayes Classifier, Trees, Random Forests,Boosting, Support Vector Machines, Feature Engineering, Neural Networks, StackingData Sets
This section contains the data sets used in the book not published elsewhere. Many additional data sets can be downloaded from other sources as indicated in the book. Several of the data sets listed here are required for the exercises. If you find something missing, please let me know.
Data Set Name | Description | ASCII (csv) | Stata |
---|---|---|---|
Hunger Games | Day 1 survival in the Hunger Games novel | Download | |
Patient Joe | "What should Joe do?". Version with Bigrams. | Download | |
Optometry Data | Predict a marker for Giant cell arteritis | Download | |
Oscar Data | Movies nominated for the Oscar | Download | |
Pharmacy | Compliance with a California law | Download | Download |
Shakespeare | Social status, gender and type of play | Download | Download |
Waterloo Smokers' Helpline | Text answer "what helped you quit?" | Download | |
Stature Turkey Data | Description of Stature Turkey Data | Download |
Errata
- p. 78, line 9. The coordinates are flipped. It should read "we first reach the diamond constraint at (\beta_1,\beta_2)=(1,0)." (Thanks to Professor Kurt Beron)
- p. 209. In algorithm 11.2 (MART algorithm for logit boosting), the denominator should be 1+exp(fm-1,i) rather than exp(1+fm-1,i) . (Thanks to Sarah Startek)
- p. 265 Question 12.10: The number of observations, n=1000, is missing. The following text is missing: "Additionally, independently generate 50 variables xi~N(0,1), i=1,2,...,50." (The setup is analogous to question 12.9) (Thanks to Stuart Liam Miranda)
- p.140 Question 7.7(a): "scatter plot of x1 vs x1 " should read "scatter plot of x1 vs x2 ". (Thanks to Jeffrey)
- p.281 Exercise 13.1 appears under the heading for questions "using software". Instead, this should be a question under the heading "Conceptual". (Thanks to anonymous student)