In this tutorial, you will learn about what is five number summary statistics, actual calculation of five number summary and how to find five number summary statistics in R?
What is 5-number summary?
Five number summary, popularly known as Tukey's five number summary, is a set of five descriptive statistics to summarize a continuous univariate data. The five number summary consists of
- minimum value
- lower-hinge value
- median value
- upper-hinge value
- maximum value
of a univariate data.
The lower-hinge is the median of the lower 50 percent data (after sorting) and upper-hinge is the median of the upper 50% data (after sorting).
How to find five-number summary in R?
The Tukey's five number summary for the input data can obtained using fivenum()
function available in base
package.
The syntax of the fivenum()
function is
fivenum(x,na.rm=TRUE)
where,
- x : numeric vector, may include
NA
's and+/- inf
- na.rm : logical value (default
TRUE
) to removeNA
andNaN
Numerical Problem : Five Number summary using R
Example 1: Actual Calculation of Five Number Summary using R
This example illustrate the how the five number summary is calculated. To understand the actual calculation of five number summary, let us consider built-in data frame PlantGrowth
.
# load the data
data("PlantGrowth")
# display data structure
str(PlantGrowth)
'data.frame': 30 obs. of 2 variables:
$ weight: num 4.17 5.58 5.18 6.11 4.5 4.61 5.17 4.53 5.33 5.14 ...
$ group : Factor w/ 3 levels "ctrl","trt1",..: 1 1 1 1 1 1 1 1 1 1 ...
Calculate Minimum value
Calculate minimum plant weight using min()
function.
# minimum weight
Min <- min(PlantGrowth$weight)
Min
[1] 3.59
Calculate Median value
Calculate median plant weight using median()
function.
# median weight
Median <- median(PlantGrowth$weight)
Median
[1] 5.155
Calculate Lower Hinge
Lower hinge is the median of lower 50 percent of the data after sorting the data.
# sort the weight data
weight.ordered <-sort(PlantGrowth$weight)
# no. of elements in weight
n <-length(PlantGrowth$weight)
# calculate lower hinge
Lower_hinge <-median(weight.ordered[1:round(n/2)])
# display lower hinge
Lower_hinge
[1] 4.53
Calculate Upper Hinge
Upper hinge is the median of upper 50 percent of the data after sorting the data.
# calculate upper hinge
Upper_hinge <-median(weight.ordered[(round(n/2)+1) : n])
Upper_hinge
[1] 5.54
Calculate maximum value
Calculate the maximum value using max()
function.
Max <- max(PlantGrowth$weight)
Max
[1] 6.31
Five number symmary
All the above five numbers summary statistics can be combined together using cbind()
(Column bind) function and are given names using colnames()
function.
FiveSumm <-cbind(Min, Lower_hinge,Median, Upper_hinge,Max)
colnames(FiveSumm)<- c("Min.","Lower-Hinge","Median","upper-Hinge","Max.")
FiveSumm
Min. Lower-Hinge Median upper-Hinge Max.
[1,] 3.59 4.53 5.155 5.54 6.31
Example 2: Five number summary using fivenum()
function
In earlier example, we have seen how the five number summary is calculated without using fivenum()
function. Let's see how the five number summary is calculated using fivenum()
function in R.
FiveNum <-fivenum(PlantGrowth$weight)
FiveNum
[1] 3.590 4.530 5.155 5.540 6.310
The fivenum()
function display only the values. You can assign the names to these values using names()
function as below:
names(FiveNum)<-c("Min","L-Hinge","Median","U-Hinge","Max")
FiveNum
Min L-Hinge Median U-Hinge Max
3.590 4.530 5.155 5.540 6.310
The summary()
function gives six number summary (Min, $1^{st}$ Qu., Median, Mean, $3^{rd}$ Qu. and Max.).
summary(PlantGrowth$weight)
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.590 4.550 5.155 5.073 5.530 6.310
The default quantile function (quantile()
) also gives us the minimum, $1^{st}$ quartile, median, $3^{rd}$ quartile and the maximum value of the data set.
quantile(PlantGrowth$weight)
0% 25% 50% 75% 100%
3.590 4.550 5.155 5.530 6.310
Note that there is difference between the output of fivenum()
function, summary()
function and simple quantile()
function.
The lower and upper hinge using quantile()
function can be obtained using the argument type=5
(one of the nine quantile algorithm).
quantile(PlantGrowth$weight,type=5)
0% 25% 50% 75% 100%
3.590 4.530 5.155 5.540 6.310
Note that using the type =5
argument in quantile()
function, the result of fivenum()
and quantile()
functions are identical.
Example 3: Five number summary of data with NA, NaN and Inf
Suppose that your data contains NA
, NaN
or +/- Inf
and you need to compute the five number summary.
# create a vector containing NA, Inf and NaN
y<-c(1:10,NA,+Inf,NaN)
y
## [1] 1 2 3 4 5 6 7 8 9 10 NA Inf NaN
By default, na.rm=TRUE
in fivenum()
function. It will remove the NA
and NaN
values and compute the five number summary for the data, while the values +Inf
or -Inf
will not be removed.
fivenum(y)
[1] 1.0 3.5 6.0 8.5 Inf
Endnote
In this tutorial you learned about what is five number summary and how to compute the five number summary using fivenum()
function using R. You also learned the actual method of calculating five number summary.
To learn more about descriptive statistics using R, please refer to the following tutorials:
- Statistical functions in R
- Quantiles Using R
- Moments Using R
- Moments Coefficient of Skewness using R
- Moments Coefficient of Kurtosis using R
- Descriptive Statistics Using R
Hopefully you enjoyed learning this tutorial on how to compute five number summary using R.