# tapply function in R with examples

In this tutorial, we will discuss about tapply() function in R with some examples. tapply() function is available in base R package.

## The tapply() function in R

The tapply() function is very useful to aggregate the data. That is tapply() function allows us to create a group summaries based on factor levels.

The general syntax of tapply() function is

tapply(X, INDEX,FUN=NULL,...,simplify=TRUE)

where

• X: an atomic object, typically a vector
• INDEX: list of one or more factor each of same length as X
• FUN: the function to be applied
• ...: optional argument to FUN.
• simplify: If FALSE, tapply returns an array of mode list.

The function tapply(X, INDEX,FUN) split the data of X into subgroups based on the levels of INDEX variable, then apply the function FUN to each subgroup of the data.

That is, the function tapply() applies FUN on X grouped by factors in INDEX.

## tapply() function on data frame

### Example 1: tapply() function on data frame

Let us create a sample data frame to understand the use of tapply() function on data frame.

Name <- c("john", "gloria", "rajan", "mary", "sonam")
Gender <- factor(c("M", "F", "M", "F", "F"))
Height <- c(165, 158, 160, 157, 155)
Weight <- c(72, 65, 69, 58, 49)
df <- data.frame(Name, Gender, Height, Weight)
df
    Name Gender Height Weight
1   john      M    165     72
2 gloria      F    158     65
3  rajan      M    160     69
4   mary      F    157     58
5  sonam      F    155     49

Suppose we want to calculate the average height or average weight by gender of the respondent. We can use tapply() function to calculate average height by gender as follows:

tapply(df$Height,df$Gender,mean)
       F        M
156.6667 162.5000 

To compute standard deviation of weight by gender, use the tapply() function as follows:

result <- tapply(df$Weight,df$Gender,sd)
result
       F        M
8.020806 2.121320 
class(result)
 "array"

### Example 2 : quantiles using tapply() function on data frame

Consider a built-in data frame PlantGrowth. Suppose we want to calculate quantile of weight variable grouped by factor variable group from PlantGrowth data frame.

To calculate quantiles of weight by group, we can use tapply() function as follows:

# compute the quantiles of weight by group
tapply(PlantGrowth$weight,PlantGrowth$group, quantile, probs = c(0.25, 0.50, 0.75))
$ctrl 25% 50% 75% 4.5500 5.1550 5.2925$trt1
25%    50%    75%
4.2075 4.5500 4.8700

$trt2 25% 50% 75% 5.2675 5.4350 5.7350  Note that as explained in the syntax of tapply() function, we can use optional argument ... to the function in tapply() function, like probs=c() for the quantile() function. ### Example 3: tapply() Function with user-defined function We can use a user-defined function in tapply() function to compute the summary of one variable based on the levels of some factor variable. Let us define user-defined function for standard error as follows: std.error <- function(x) { sd(x) / sqrt(length(x)) } Suppose we need to calculate the standard error of weight variable grouped a factor variable group from PlantGrowth data frame. To calculate standard errors of weight by group, we can use tapply() function as follows: # compute the standard error of weights group by group result_1 <- tapply(PlantGrowth$weight, PlantGrowth$group, std.error) result_1  ctrl trt1 trt2 0.1843897 0.2509823 0.1399540  class(result_1)  "array" Note that the default output of tapply() function is array. That is the class of the default output is array. So the elements of the output can be accessed using square bracket [ ] with index. # gives the second element of result result_1  trt1 0.2509823  ### Example 4: Simplified result using tapply() Function For the example discussed above, the default value of the argument simplify is TRUE. The list output can be obtained using an additional argument simplify=FALSE. To calculate standard errors of weight by group to get list output, we can use tapply() function as follows: # compute the standard error of weights group by group result_2 <- tapply(PlantGrowth$weight, PlantGrowth$group, std.error,simplify=FALSE) result_2 $ctrl
 0.1843897

$trt1  0.2509823$trt2
 0.139954

The component of the list can be accessed using single square bracket with index.

# extract the second component of list
result_2
$trt1  0.2509823 The element of the component of list can be accessed using double square bracket with index. # extract the element of second component of list result_2[]  0.2509823 ### Example 5: tapply() Function with multiple factors The tapply() function can also be used on multiple factor variables. To apply tapply() function on multiple factor variables, the INDEX argument can be used as a list. Consider a built-in data frame warpbreaks. data("warpbreaks") str(warpbreaks) 'data.frame': 54 obs. of 3 variables:$ breaks : num  26 30 54 25 70 52 51 26 67 18 ...
$wool : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...$ tension: Factor w/ 3 levels "L","M","H": 1 1 1 1 1 1 1 1 1 2 ...

In the warpbreaks data frame, the factor variable wool has two levels (i.e., Wool type A and wool type B) and the factor variable tension has three levels (i.e., L for Low, M for Medium and H for High).

Let us calculate the mean number of breaks for various levels of wool and tension. To calculate the mean number of breaks grouped by wool and tension we can use the tapply() function as follows:

attach(warpbreaks)
result_3 <-tapply(breaks,list(wool,tension),mean)
result_3
         L        M        H
A 44.55556 24.00000 24.55556
B 28.22222 28.77778 18.77778

The mean number of breaks for the wool type A and the level of tension L is 44.5555556.

Note that all the apply functions (apply(),tapply(), sapply() and lapply()are more efficient than loops (for loop, while loop).

## Endnote

In this tutorial you learned about tapply() function in R and how to use tapply() function on vector,list and data frame with illustration.

Hopefully you enjoyed learning this tutorial on tapply() function in R. Hope the content is more than sufficient to understand tapply() function in R.