Contents

In this tutorial, you will learn about what is five number summary statistics, actual calculation of five number summary and how to find five number summary statistics in R?

## What is 5-number summary?

Five number summary, popularly known as **Tukey's five number summary**, is a set of five descriptive statistics to summarize a continuous univariate data. The five number summary consists of

- minimum value
- lower-hinge value
- median value
- upper-hinge value
- maximum value

of a univariate data.

The lower-hinge is the median of the lower 50 percent data (after sorting) and upper-hinge is the median of the upper 50% data (after sorting).

## How to find five-number summary in R?

The **Tukey's five number summary** for the input data can obtained using `fivenum()`

function available in `base`

package.

The syntax of the `fivenum()`

function is

`fivenum(x,na.rm=TRUE)`

where,

**x :**numeric vector, may include`NA`

's and`+/- inf`

**na.rm :**logical value (default`TRUE`

) to remove`NA`

and`NaN`

## Numerical Problem : Five Number summary using R

### Example 1: Actual Calculation of Five Number Summary using R

This example illustrate the how the five number summary is calculated. To understand the actual calculation of five number summary, let us consider built-in data frame `PlantGrowth`

.

```
# load the data
data("PlantGrowth")
# display data structure
str(PlantGrowth)
```

```
'data.frame': 30 obs. of 2 variables:
$ weight: num 4.17 5.58 5.18 6.11 4.5 4.61 5.17 4.53 5.33 5.14 ...
$ group : Factor w/ 3 levels "ctrl","trt1",..: 1 1 1 1 1 1 1 1 1 1 ...
```

#### Calculate Minimum value

Calculate minimum plant weight using `min()`

function.

```
# minimum weight
Min <- min(PlantGrowth$weight)
Min
```

`[1] 3.59`

#### Calculate Median value

Calculate median plant weight using `median()`

function.

```
# median weight
Median <- median(PlantGrowth$weight)
Median
```

`[1] 5.155`

#### Calculate Lower Hinge

Lower hinge is the median of **lower 50 percent** of the data after sorting the data.

```
# sort the weight data
weight.ordered <-sort(PlantGrowth$weight)
# no. of elements in weight
n <-length(PlantGrowth$weight)
# calculate lower hinge
Lower_hinge <-median(weight.ordered[1:round(n/2)])
# display lower hinge
Lower_hinge
```

`[1] 4.53`

#### Calculate Upper Hinge

Upper hinge is the median of **upper 50 percent** of the data after sorting the data.

```
# calculate upper hinge
Upper_hinge <-median(weight.ordered[(round(n/2)+1) : n])
Upper_hinge
```

`[1] 5.54`

#### Calculate maximum value

Calculate the maximum value using `max()`

function.

```
Max <- max(PlantGrowth$weight)
Max
```

`[1] 6.31`

#### Five number symmary

All the above five numbers summary statistics can be combined together using `cbind()`

(Column bind) function and are given names using `colnames()`

function.

```
FiveSumm <-cbind(Min, Lower_hinge,Median, Upper_hinge,Max)
colnames(FiveSumm)<- c("Min.","Lower-Hinge","Median","upper-Hinge","Max.")
FiveSumm
```

```
Min. Lower-Hinge Median upper-Hinge Max.
[1,] 3.59 4.53 5.155 5.54 6.31
```

### Example 2: Five number summary using `fivenum()`

function

In earlier example, we have seen how the five number summary is calculated without using `fivenum()`

function. Let's see how the five number summary is calculated using `fivenum()`

function in R.

```
FiveNum <-fivenum(PlantGrowth$weight)
FiveNum
```

`[1] 3.590 4.530 5.155 5.540 6.310`

The `fivenum()`

function display only the values. You can assign the names to these values using `names()`

function as below:

```
names(FiveNum)<-c("Min","L-Hinge","Median","U-Hinge","Max")
FiveNum
```

```
Min L-Hinge Median U-Hinge Max
3.590 4.530 5.155 5.540 6.310
```

The `summary()`

function gives six number summary (Min, $1^{st}$ Qu., Median, Mean, $3^{rd}$ Qu. and Max.).

`summary(PlantGrowth$weight)`

```
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.590 4.550 5.155 5.073 5.530 6.310
```

The default quantile function (`quantile()`

) also gives us the minimum, $1^{st}$ quartile, median, $3^{rd}$ quartile and the maximum value of the data set.

`quantile(PlantGrowth$weight)`

```
0% 25% 50% 75% 100%
3.590 4.550 5.155 5.530 6.310
```

Note that there is difference between the output of `fivenum()`

function, `summary()`

function and simple `quantile()`

function.

The lower and upper hinge using `quantile()`

function can be obtained using the argument `type=5`

(one of the nine quantile algorithm).

`quantile(PlantGrowth$weight,type=5)`

```
0% 25% 50% 75% 100%
3.590 4.530 5.155 5.540 6.310
```

Note that using the `type =5`

argument in `quantile()`

function, the result of `fivenum()`

and `quantile()`

functions are identical.

### Example 3: Five number summary of data with NA, NaN and Inf

Suppose that your data contains `NA`

, `NaN`

or `+/- Inf`

and you need to compute the five number summary.

```
# create a vector containing NA, Inf and NaN
y<-c(1:10,NA,+Inf,NaN)
y
```

`## [1] 1 2 3 4 5 6 7 8 9 10 NA Inf NaN`

By default, `na.rm=TRUE`

in `fivenum()`

function. It will remove the `NA`

and `NaN`

values and compute the five number summary for the data, while the values `+Inf`

or `-Inf`

will not be removed.

`fivenum(y)`

`[1] 1.0 3.5 6.0 8.5 Inf`

## Endnote

In this tutorial you learned about what is five number summary and how to compute the five number summary using `fivenum()`

function using R. You also learned the actual method of calculating five number summary.

To learn more about descriptive statistics using R, please refer to the following tutorials:

- Statistical functions in R
- Quantiles Using R
- Moments Using R
- Moments Coefficient of Skewness using R
- Moments Coefficient of Kurtosis using R
- Descriptive Statistics Using R

Hopefully you enjoyed learning this tutorial on how to compute five number summary using R.