2 min read

Titanic R

I am excited to announce the launch of my new book project, Titanic R: a complete introduction to statistics using Titanic passenger survival data and the R programming language. I have been teaching an introductory statistics course online for a number of years now, and one problem is that the sample datasets are kind of dry. Automobile fuel efficiency from the 1970s. Boston housing market data from the 1970s. Blood pressure readings from the 1970s. Or, for a change of pace, measurements of flower dimensions from the 1940s. The prevailing idea is that it does not matter what the data are, as long as they illustrate some concept adequately. But for the student just learning statistics, just learning R, or both, concepts are less likely to stick if they have no wider meaning. If it was common knowledge that one type of flower had wider petals than another type of flower, then a statistical test or graphic that showed this would be more likely to stick. You expect a particular result, you generate and see that result, you remember how to do it again on your own data.

I chose the Titanic data set because it is compelling, reasonably large but not too large (roughly 2,200 rows by a dozen columns), we know what findings to expect (women, children, and first-class passengers were more likely to survive), but the data also contain a few less-obvious patterns. A subset of about 60% of the data has also been used extensively by others to teach advanced statistical modeling methods.

The book will be written using the rbookdown package and available through Leanpub at modest cost, hopefully by the end of 2020, provided no other large projects get in the way.