2 min read

Waffle charts*

Waffle charts

This week we will go into greater depth on the topic of data visualization. We have already looked at scatterplots and bar charts, so this week will build on things you already know. In my opinion, data visualization is a vitally important part of statistical inference, one that is sometimes omitted from courses like these, where students are led to believe this is something they can figure out for themselves. On the contrary, in my meeetings with clients I would say I spend well over half the time interpreting graphic presentations of data.

Since we have already covered bar graphs, I will jump straight to a new time of chart known as the waffle chart. It uses a package called waffle. I call the waffle library and load our familiar heart data. As always, we need dplyr as well. In line 20, I replace the annoying ? values with 9s. NAs would have been better, but I wanted to show an example of how you can replace a value with another value using mutate().

library(waffle)
library(dplyr)
heart <- read.csv("C:/epi551/old/heart.csv",header=T)
heart <- mutate(heart,Thalassemia=replace(Thalassemia, Thalassemia=="?","9"))

In order to make a waffle plot of the thalassemia variable, we first need to convert it into a table using the table() function. We then use the waffle() function on the table, specifying the number of parts (squares) and rows. The first example plots 303 squares (one for every observation) in 10 rows. The second example plots 36 squares (303 divided by 8, rounded off) in 6 rows. The third example is the same as the second, but with a nicer legend.

thaltable <- table(heart$Thalassemia)

waffle(parts=thaltable, rows=10)

waffle(parts=thaltable/8, rows=6)

heart$thal <- factor(heart$Thalassemia, labels=c("3"="Normal",
        "6"="Fixed defect","7"="Reversable defect", "9"="Unknown"))

thaltable <- table(heart$thal)

waffle(parts=thaltable/8, rows=6,
       colors = c("3"="tomato",
                  "6"="mediumblue",
                  "7"="darkgreen",
                  "9"="darkgoldenrod"),
       reverse=FALSE)

What happened to the unknown category in the last few plots? The number of observations is so small that it only would get a tiny fraction of a square, hence it disappears.

How do you learn the names of the colors in R? Type demo(colors) in the console. You’ll have to hit zoom to see the resulting plot.