How to compute five number summary statistics in R with examples

In this tutorial, you will learn about what is five number summary statistics, actual calculation of five number summary and how to find five number summary statistics in R?

What is 5-number summary?

Five number summary, popularly known as Tukey's five number summary, is a set of five descriptive statistics to summarize a continuous univariate data. The five number summary consists of

  • minimum value
  • lower-hinge value
  • median value
  • upper-hinge value
  • maximum value

of a univariate data.

The lower-hinge is the median of the lower 50 percent data (after sorting) and upper-hinge is the median of the upper 50% data (after sorting).

How to find five-number summary in R?

The Tukey's five number summary for the input data can obtained using fivenum() function available in base package.

The syntax of the fivenum() function is

fivenum(x,na.rm=TRUE)

where,

  • x : numeric vector, may include NA's and +/- inf
  • na.rm : logical value (default TRUE) to remove NA and NaN

Numerical Problem : Five Number summary using R

Example 1: Actual Calculation of Five Number Summary using R

This example illustrate the how the five number summary is calculated. To understand the actual calculation of five number summary, let us consider built-in data frame PlantGrowth.

# load the data
data("PlantGrowth")
# display data structure
str(PlantGrowth)
'data.frame': 30 obs. of  2 variables:
 $ weight: num  4.17 5.58 5.18 6.11 4.5 4.61 5.17 4.53 5.33 5.14 ...
 $ group : Factor w/ 3 levels "ctrl","trt1",..: 1 1 1 1 1 1 1 1 1 1 ...

Calculate Minimum value

Calculate minimum plant weight using min() function.

# minimum weight
Min <- min(PlantGrowth$weight)
Min
[1] 3.59

Calculate Median value

Calculate median plant weight using median() function.

# median weight
Median <- median(PlantGrowth$weight)
Median
[1] 5.155

Calculate Lower Hinge

Lower hinge is the median of lower 50 percent of the data after sorting the data.

# sort the weight data
weight.ordered <-sort(PlantGrowth$weight)
# no. of elements in weight
n <-length(PlantGrowth$weight)
# calculate lower hinge
Lower_hinge <-median(weight.ordered[1:round(n/2)])
# display lower hinge
Lower_hinge
[1] 4.53

Calculate Upper Hinge

Upper hinge is the median of upper 50 percent of the data after sorting the data.

# calculate upper hinge
Upper_hinge <-median(weight.ordered[(round(n/2)+1) : n])
Upper_hinge
[1] 5.54

Calculate maximum value

Calculate the maximum value using max() function.

Max <- max(PlantGrowth$weight)
Max
[1] 6.31

Five number symmary

All the above five numbers summary statistics can be combined together using cbind() (Column bind) function and are given names using colnames() function.

FiveSumm <-cbind(Min, Lower_hinge,Median, Upper_hinge,Max)
colnames(FiveSumm)<- c("Min.","Lower-Hinge","Median","upper-Hinge","Max.")
FiveSumm
     Min. Lower-Hinge Median upper-Hinge Max.
[1,] 3.59        4.53  5.155        5.54 6.31

Example 2: Five number summary using fivenum() function

In earlier example, we have seen how the five number summary is calculated without using fivenum() function. Let's see how the five number summary is calculated using fivenum() function in R.

FiveNum <-fivenum(PlantGrowth$weight)
FiveNum
[1] 3.590 4.530 5.155 5.540 6.310

The fivenum() function display only the values. You can assign the names to these values using names() function as below:

names(FiveNum)<-c("Min","L-Hinge","Median","U-Hinge","Max")
FiveNum
    Min L-Hinge  Median U-Hinge     Max 
  3.590   4.530   5.155   5.540   6.310 

The summary() function gives six number summary (Min, $1^{st}$ Qu., Median, Mean, $3^{rd}$ Qu. and Max.).

summary(PlantGrowth$weight)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  3.590   4.550   5.155   5.073   5.530   6.310 

The default quantile function (quantile()) also gives us the minimum, $1^{st}$ quartile, median, $3^{rd}$ quartile and the maximum value of the data set.

quantile(PlantGrowth$weight)
   0%   25%   50%   75%  100% 
3.590 4.550 5.155 5.530 6.310 

Note that there is difference between the output of fivenum() function, summary() function and simple quantile() function.

The lower and upper hinge using quantile() function can be obtained using the argument type=5 (one of the nine quantile algorithm).

quantile(PlantGrowth$weight,type=5)
   0%   25%   50%   75%  100% 
3.590 4.530 5.155 5.540 6.310 

Note that using the type =5 argument in quantile() function, the result of fivenum() and quantile() functions are identical.

Example 3: Five number summary of data with NA, NaN and Inf

Suppose that your data contains NA, NaN or +/- Inf and you need to compute the five number summary.

# create a vector containing NA, Inf and NaN
y<-c(1:10,NA,+Inf,NaN)
y
##  [1]   1   2   3   4   5   6   7   8   9  10  NA Inf NaN

By default, na.rm=TRUE in fivenum() function. It will remove the NA and NaN values and compute the five number summary for the data, while the values +Inf or -Inf will not be removed.

fivenum(y)
[1] 1.0 3.5 6.0 8.5 Inf

Endnote

In this tutorial you learned about what is five number summary and how to compute the five number summary using fivenum() function using R. You also learned the actual method of calculating five number summary.

To learn more about descriptive statistics using R, please refer to the following tutorials:

Hopefully you enjoyed learning this tutorial on how to compute five number summary using R.

Leave a Comment