Today I present a few simple statistical functions. The first is the c() function. Officially it stands for concatenate, but I prefer to think of it as ‘collection of’.
Most often, we read in data from .csv files or other kinds of external files, but occasionally we want to just see how something works with some quick made-up data. In the line below, I’ve created a variable called data which consists of 11 values arranged using the c() function:
data <- c(3,6,8,12,22,5,4,9,10,5,0)
I can then calculate the mean, or average, of the data:
mean(data)
## [1] 7.636364
The median of the data is the value right in the middle after ranking them. (When you have an even number of values, there is not a single value in the middle, so instead the two middle values are averaged):
median(data)
## [1] 6
Maximum and minimum are pretty straightforward:
max(data)
## [1] 22
min(data)
## [1] 0
Finally for this post, we have the mode, or the most commonly occurring value. You can readily see that it is 5, because it is the only value that repeats. Remarkably, R does not have a built-in mode function! mode() in R instead gives you some other characteristic of the data:
mode(data)
## [1] "numeric"
This is one of the few examples I can think of where an R function does not behave as expected. But there is a workaround – you can build your own mode function. Since mode() is already taken, I’ve used Mode(). For any beginners reading this, do not worry about the syntax right now. Once you’ve defined this function, you can use Mode() on any data you like.
Mode <- function(x){
y <- data.frame(table(x))
y[y$Freq == max(y$Freq),1]
}
Mode(data)
## [1] 5
## Levels: 0 3 4 5 6 8 9 10 12 22
Mode(c(1,2,3,4,5,5,5,5,5,6,7,8))
## [1] 5
## Levels: 1 2 3 4 5 6 7 8