Book Cover

Applied Statistical Learning

with Case Studies in Stata

Available on .

See Springer's web site


Review by the Stata technical group

Review in JRSS-A


The book covers the following topics: Statistical Learning: Concepts, Statistical Learning: Practical Aspects, Logistic regression, Lasso and Friends, Working with text data, Nearest Neighbors, The Naive Bayes Classifier, Trees, Random Forests,Boosting, Support Vector Machines, Feature Engineering, Neural Networks, Stacking

Data Sets

This section contains the data sets used in the book not published elsewhere. Many additional data sets can be downloaded from other sources as indicated in the book. Several of the data sets listed here are required for the exercises. If you find something missing, please let me know.

Data Set Name Description ASCII (csv) Stata
Hunger Games Day 1 survival in the Hunger Games novel Download
Patient Joe "What should Joe do?". Version with Bigrams. Download
Optometry Data Predict a marker for Giant cell arteritis Download
Oscar Data Movies nominated for the Oscar Download
Pharmacy Compliance with a California law Download Download
Shakespeare Social status, gender and type of play Download Download
Waterloo Smokers' Helpline Text answer "what helped you quit?" Download
Stature Turkey Data Description of Stature Turkey Data Download


  • p. 78, line 9. The coordinates are flipped. It should read "we first reach the diamond constraint at (\beta_1,\beta_2)=(1,0)." (Thanks to Professor Kurt Beron)
  • p. 209. In algorithm 11.2 (MART algorithm for logit boosting), the denominator should be 1+exp(fm-1,i) rather than exp(1+fm-1,i) . (Thanks to Sarah Startek)
  • p. 265 Question 12.10: The number of observations, n=1000, is missing. The following text is missing: "Additionally, independently generate 50 variables xi~N(0,1), i=1,2,...,50." (The setup is analogous to question 12.9) (Thanks to Stuart Liam Miranda)

Return to Home Page
Remove navigation bar on the left