Introduction to t scores
A confidence interval, just like it sounds, gives you a sense of how confident you are in your measurement. The most common confidence interval is a 95% confidence interval. The technical interpretation of a confidence interval is that if you repeated your experiment or data collection exercise many, many times, that your confidence intervals would contain the true value 95% of the time. Please read that last sentence again. It’s kind of an upside-down logic that many people (even some experts) have trouble with. In the real world, especially in public health, we do not have the luxury of repeating experiments many times. This way of think came from experiments on factory assembly lines, which is where a lot statistical theory was first developed. The usual way most people interpret a 95% confidence interval is that in a single experiment, there is a 95% chance that the true value is within the confidence interval. This is technically incorrect, but for the purposes of this class, it is good enough. There are few scenarios where this distinction matters.
In other words, if we have a homework problem down the road where the cancer rate in Georgia in 2019 is found to be 405 per 100,000 with confidence intervals of (390, 420), you’re allowed to conclude “the cancer rate in Georgia is highly likely to be between 390 and 420”. You don’t have to say, “If we could travel back in time and repeat 2019 over and over again, we would find that in 95% of our time-travels, the cancer rate in Georgia was found to be between 390 and 420”.
We’ve previously learned about Z scores, which is what we use when we are dealing with populations. Populations can be thought of as infinitely large (imagine sampling the air at all possible points) or at least so large they are mathematically close enough to infinite (imagine the population of the United States). We’ve learned that a Z-score of 1.96 is a magic number above which 2.5% of the observations in a normal distribution lie. Another 2.5% lie below a Z-score of -1.96, for a total of 5%.
When you are dealing with samples instead of populations, you have less information to work with and thus your estimates have to be more conservative. Instead of using Z-scores, we substitute something called t-scores. (Why Z and t? That’s what the original statisticians making these discoveries picked). t scores depend on the sample size and are larger than Z scores. For examle, when your sample size is 60, the corresponding t-score for a Z-score of 1.96 is 2.00. That’s a pretty small difference. It means that once your sample size is above 60, you don’t really need to worry about the difference between t and Z. (In my own work with population-based health data, I usually have samples of many thousands, so this never comes up). For those conducting complex and expensive lab experiments, you may only be able to manage a sample size of 4 or 9. Here, the corresponding t-values to 1.96 are 2.78 and 2.26. These are different enough from 1.96 that they could influence your interpretation of your results.
Think of it this way: When your sample size is small, the standard of evidence is higher for demonstrating that your findings are interesting or unusual, so you need to puff up that 1.96 number a bit.
Though above I said that a sample size of 60 was big enough for the difference between t and Z not to matter, statisticians have traditionally settled on an even smaller number - 30. The t-value for 29 is 2.045, which statisticians have considered “close enough” to 1.96. Again, this distinction only really mattered before there were computers. R can calculate either equally fast.
With the normal distribution, we saw the R functions pnorm() and dnorm(). The corresponding functions with small samples are pt() and dt().