In this tutorial, we will discuss about `tapply()`

function in R with some examples. `tapply()`

function is available in `base`

R package.

## The tapply() function in R

The `tapply()`

function is very useful to aggregate the data. That is `tapply()`

function allows us to create a group summaries based on factor levels.

The general syntax of `tapply()`

function is

`tapply(X, INDEX,FUN=NULL,...,simplify=TRUE)`

where

**X:**an atomic object, typically a vector**INDEX:**list of one or more factor each of same length as`X`

**FUN:**the function to be applied**...:**optional argument to`FUN`

.**simplify:**If FALSE,`tapply`

returns an array of mode list.

The function `tapply(X, INDEX,FUN)`

split the data of `X`

into subgroups based on the levels of `INDEX`

variable, then apply the function `FUN`

to each subgroup of the data.

That is, the function `tapply()`

applies `FUN`

on `X`

grouped by factors in `INDEX`

.

## tapply() function on data frame

### Example 1: tapply() function on data frame

Let us create a sample data frame to understand the use of `tapply()`

function on data frame.

```
Name <- c("john", "gloria", "rajan", "mary", "sonam")
Gender <- factor(c("M", "F", "M", "F", "F"))
Height <- c(165, 158, 160, 157, 155)
Weight <- c(72, 65, 69, 58, 49)
df <- data.frame(Name, Gender, Height, Weight)
df
```

```
Name Gender Height Weight
1 john M 165 72
2 gloria F 158 65
3 rajan M 160 69
4 mary F 157 58
5 sonam F 155 49
```

Suppose we want to calculate the average height or average weight by gender of the respondent. We can use `tapply()`

function to calculate average height by gender as follows:

`tapply(df$Height,df$Gender,mean)`

```
F M
156.6667 162.5000
```

To compute standard deviation of weight by gender, use the `tapply()`

function as follows:

```
result <- tapply(df$Weight,df$Gender,sd)
result
```

```
F M
8.020806 2.121320
```

`class(result)`

`[1] "array"`

### Example 2 : quantiles using tapply() function on data frame

Consider a built-in data frame `PlantGrowth`

. Suppose we want to calculate quantile of `weight`

variable grouped by factor variable `group`

from `PlantGrowth`

data frame.

To calculate quantiles of `weight`

by `group`

, we can use `tapply()`

function as follows:

```
# compute the quantiles of weight by group
tapply(PlantGrowth$weight,PlantGrowth$group, quantile, probs = c(0.25, 0.50, 0.75))
```

```
$ctrl
25% 50% 75%
4.5500 5.1550 5.2925
$trt1
25% 50% 75%
4.2075 4.5500 4.8700
$trt2
25% 50% 75%
5.2675 5.4350 5.7350
```

Note that as explained in the syntax of `tapply()`

function, we can use optional argument `...`

to the function in `tapply()`

function, like `probs=c()`

for the `quantile()`

function.

### Example 3: tapply() Function with user-defined function

We can use a user-defined function in `tapply()`

function to compute the summary of one variable based on the levels of some factor variable.

Let us define user-defined function for standard error as follows:

```
std.error <- function(x) {
sd(x) / sqrt(length(x))
}
```

Suppose we need to calculate the standard error of `weight`

variable grouped a factor variable `group`

from `PlantGrowth`

data frame.

To calculate standard errors of `weight`

by `group`

, we can use `tapply()`

function as follows:

```
# compute the standard error of weights group by group
result_1 <- tapply(PlantGrowth$weight, PlantGrowth$group, std.error)
result_1
```

```
ctrl trt1 trt2
0.1843897 0.2509823 0.1399540
```

`class(result_1)`

`[1] "array"`

Note that the default output of `tapply()`

function is `array`

. That is the class of the default output is `array`

. So the elements of the output can be accessed using square bracket `[ ]`

with index.

```
# gives the second element of result
result_1[2]
```

```
trt1
0.2509823
```

### Example 4: Simplified result using tapply() Function

For the example discussed above, the default value of the argument `simplify`

is `TRUE`

. The list output can be obtained using an additional argument `simplify=FALSE`

.

To calculate standard errors of `weight`

by `group`

to get list output, we can use `tapply()`

function as follows:

```
# compute the standard error of weights group by group
result_2 <- tapply(PlantGrowth$weight, PlantGrowth$group,
std.error,simplify=FALSE)
result_2
```

```
$ctrl
[1] 0.1843897
$trt1
[1] 0.2509823
$trt2
[1] 0.139954
```

The component of the list can be accessed using single square bracket with index.

```
# extract the second component of list
result_2[2]
```

```
$trt1
[1] 0.2509823
```

The element of the component of list can be accessed using double square bracket with index.

```
# extract the element of second component of list
result_2[[2]]
```

`[1] 0.2509823`

### Example 5: tapply() Function with multiple factors

The `tapply()`

function can also be used on multiple factor variables. To apply `tapply()`

function on multiple factor variables, the `INDEX`

argument can be used as a list.

Consider a built-in data frame `warpbreaks`

.

```
data("warpbreaks")
str(warpbreaks)
```

```
'data.frame': 54 obs. of 3 variables:
$ breaks : num 26 30 54 25 70 52 51 26 67 18 ...
$ wool : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
$ tension: Factor w/ 3 levels "L","M","H": 1 1 1 1 1 1 1 1 1 2 ...
```

In the `warpbreaks`

data frame, the factor variable `wool`

has two levels (i.e., Wool type A and wool type B) and the factor variable `tension`

has three levels (i.e., `L`

for Low, `M`

for Medium and `H`

for High).

Let us calculate the mean number of `breaks`

for various levels of `wool`

and `tension`

. To calculate the mean number of `breaks`

grouped by `wool`

and `tension`

we can use the `tapply()`

function as follows:

```
attach(warpbreaks)
result_3 <-tapply(breaks,list(wool,tension),mean)
result_3
```

```
L M H
A 44.55556 24.00000 24.55556
B 28.22222 28.77778 18.77778
```

The mean number of breaks for the wool type A and the level of tension L is 44.5555556.

Note that all the apply functions (`apply()`

,`tapply()`

, `sapply()`

and `lapply()`

are more efficient than loops (for loop, while loop).

## Endnote

In this tutorial you learned about `tapply()`

function in R and how to use `tapply()`

function on vector,list and data frame with illustration.

Learn more about functions in R, refer to the following tutorials:

Hopefully you enjoyed learning this tutorial on `tapply()`

function in R. Hope the content is more than sufficient to understand `tapply()`

function in R.