I am a professor of statistics at the University of Waterloo in Ontario, Canada. I spent the 2022/2023 academic year on sabbatical at GESIS, a German institute for the social sciences, and the University of Mannheim. Previously, I was on sabbatical at the University of Auckland (2015/2016) and the German Institute for economic analysis (DIW) in Berlin, Germany (2009/2010). The Berlin sabbatical was made possible in cooperation with the Max Planck Institute for Human Development (MPIB). Prior to becoming a professor, I was a statistician at RAND and head of the Rand Statistical Consulting Service in Santa Monica, California, and later in Pittsburgh, PA. From 1997-1999, I held a joint appointment with the National Institute of Statistical Sciences and AT&T Labs - Research In 1997, I graduated from the University of Waterloo. You can email me at schonlau at uwaterloo dot ca.
Book on statistical learning
My new book on statistical learning came out in August 2023.
Book on Statistical Learning
Humboldt Prize
I am excited to announce I won a Humboldt Prize (2022).
This is a lifetime achievement award for internationally renowned scientists (not living in Germany). The prize is open to all scientific fields.
I believe I am only the second statistician in Canada to ever win it. The first winner was Professor Christian Genest at McGill University.
Research Interests:
survey methodology, application of natural language processing to open-ended questions, visualization, statistical software (Python/ Stata)
Current Research Projects:
Open-ended questions in Surveys
Text data from open-ended questions in surveys are difficult to analyze and are frequently ignored.
Yet open-ended questions are important because they do not constrain respondents' answer choices.
Where open-ended questions are necessary, sometimes multiple human coders hand-code answers into one of several categories.
At the same time, computer scientists have made impressive advances in natural language processing
that allow automation of such coding.
Past work includes semi-automatic categorization of open-ended questions where
automated algorithms do not achieve an overall accuracy high enough to entirely replace humans,
intercoder disagreements in statistical learning algorithms when training data are double coded,
occupation coding ("What is your job?"),
analysis of final comments at the end of surveys ("Do you have any other comment?"),
and some active learning for text data. Different aspects of this work is continuing,
including voice captures of open-ended answers.
My research has been funded by the Canadian Social Sciences and Humanities Research Council (SSHRC).
Statistical Software.
When the opportunity arises I enjoy programming. Much of my early programming was in C/C++
(e.g. software for the analysis of computer experiments). Because of time constraints I have lately
focused on add-on programs in Stata that seamlessly integrate with existing Stata commands.
This includes plugins for gradient boosting, support vector machines, random forests, ngram variables.
Possibly my most popular Stata program is my implementation of respondent driven sampling.