tapply function in R with examples

In this tutorial, we will discuss about tapply() function in R with some examples. tapply() function is available in base R package.

The tapply() function in R

The tapply() function is very useful to aggregate the data. That is tapply() function allows us to create a group summaries based on factor levels.

The general syntax of tapply() function is

tapply(X, INDEX,FUN=NULL,...,simplify=TRUE)

where

  • X: an atomic object, typically a vector
  • INDEX: list of one or more factor each of same length as X
  • FUN: the function to be applied
  • ...: optional argument to FUN.
  • simplify: If FALSE, tapply returns an array of mode list.

The function tapply(X, INDEX,FUN) split the data of X into subgroups based on the levels of INDEX variable, then apply the function FUN to each subgroup of the data.

That is, the function tapply() applies FUN on X grouped by factors in INDEX.

tapply() function on data frame

Example 1: tapply() function on data frame

Let us create a sample data frame to understand the use of tapply() function on data frame.

Name <- c("john", "gloria", "rajan", "mary", "sonam")
Gender <- factor(c("M", "F", "M", "F", "F"))
Height <- c(165, 158, 160, 157, 155)
Weight <- c(72, 65, 69, 58, 49)
df <- data.frame(Name, Gender, Height, Weight)
df
    Name Gender Height Weight
1   john      M    165     72
2 gloria      F    158     65
3  rajan      M    160     69
4   mary      F    157     58
5  sonam      F    155     49

Suppose we want to calculate the average height or average weight by gender of the respondent. We can use tapply() function to calculate average height by gender as follows:

tapply(df$Height,df$Gender,mean)
       F        M 
156.6667 162.5000 

To compute standard deviation of weight by gender, use the tapply() function as follows:

result <- tapply(df$Weight,df$Gender,sd)
result
       F        M 
8.020806 2.121320 
class(result)
[1] "array"

Example 2 : quantiles using tapply() function on data frame

Consider a built-in data frame PlantGrowth. Suppose we want to calculate quantile of weight variable grouped by factor variable group from PlantGrowth data frame.

To calculate quantiles of weight by group, we can use tapply() function as follows:

# compute the quantiles of weight by group
tapply(PlantGrowth$weight,PlantGrowth$group, quantile, probs = c(0.25, 0.50, 0.75))
$ctrl
   25%    50%    75% 
4.5500 5.1550 5.2925 

$trt1
   25%    50%    75% 
4.2075 4.5500 4.8700 

$trt2
   25%    50%    75% 
5.2675 5.4350 5.7350 

Note that as explained in the syntax of tapply() function, we can use optional argument ... to the function in tapply() function, like probs=c() for the quantile() function.

Example 3: tapply() Function with user-defined function

We can use a user-defined function in tapply() function to compute the summary of one variable based on the levels of some factor variable.

Let us define user-defined function for standard error as follows:

std.error <- function(x) {
  sd(x) / sqrt(length(x))
}

Suppose we need to calculate the standard error of weight variable grouped a factor variable group from PlantGrowth data frame.

To calculate standard errors of weight by group, we can use tapply() function as follows:

# compute the standard error of weights group by group
result_1 <- tapply(PlantGrowth$weight, PlantGrowth$group, std.error) 
result_1
     ctrl      trt1      trt2 
0.1843897 0.2509823 0.1399540 
class(result_1)
[1] "array"

Note that the default output of tapply() function is array. That is the class of the default output is array. So the elements of the output can be accessed using square bracket [ ] with index.

# gives the second element of result
result_1[2]
     trt1 
0.2509823 

Example 4: Simplified result using tapply() Function

For the example discussed above, the default value of the argument simplify is TRUE. The list output can be obtained using an additional argument simplify=FALSE.

To calculate standard errors of weight by group to get list output, we can use tapply() function as follows:

# compute the standard error of weights group by group
result_2 <- tapply(PlantGrowth$weight, PlantGrowth$group,
                 std.error,simplify=FALSE) 
result_2
$ctrl
[1] 0.1843897

$trt1
[1] 0.2509823

$trt2
[1] 0.139954

The component of the list can be accessed using single square bracket with index.

# extract the second component of list
result_2[2]
$trt1
[1] 0.2509823

The element of the component of list can be accessed using double square bracket with index.

# extract the element of second component of list
result_2[[2]]
[1] 0.2509823

Example 5: tapply() Function with multiple factors

The tapply() function can also be used on multiple factor variables. To apply tapply() function on multiple factor variables, the INDEX argument can be used as a list.

Consider a built-in data frame warpbreaks.

data("warpbreaks")
str(warpbreaks)
'data.frame': 54 obs. of  3 variables:
 $ breaks : num  26 30 54 25 70 52 51 26 67 18 ...
 $ wool   : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
 $ tension: Factor w/ 3 levels "L","M","H": 1 1 1 1 1 1 1 1 1 2 ...

In the warpbreaks data frame, the factor variable wool has two levels (i.e., Wool type A and wool type B) and the factor variable tension has three levels (i.e., L for Low, M for Medium and H for High).

Let us calculate the mean number of breaks for various levels of wool and tension. To calculate the mean number of breaks grouped by wool and tension we can use the tapply() function as follows:

attach(warpbreaks)
result_3 <-tapply(breaks,list(wool,tension),mean)
result_3
         L        M        H
A 44.55556 24.00000 24.55556
B 28.22222 28.77778 18.77778

The mean number of breaks for the wool type A and the level of tension L is 44.5555556.

Note that all the apply functions (apply(),tapply(), sapply() and lapply()are more efficient than loops (for loop, while loop).

Endnote

In this tutorial you learned about tapply() function in R and how to use tapply() function on vector,list and data frame with illustration.

Learn more about functions in R, refer to the following tutorials:

Hopefully you enjoyed learning this tutorial on tapply() function in R. Hope the content is more than sufficient to understand tapply() function in R.

VRCBuzz co-founder and passionate about making every day the greatest day of life. Raju is nerd at heart with a background in Statistics. Raju looks after overseeing day to day operations as well as focusing on strategic planning and growth of VRCBuzz products and services. Raju has more than 25 years of experience in Teaching fields. He gain energy by helping people to reach their goal and motivate to align to their passion. Raju holds a Ph.D. degree in Statistics. Raju loves to spend his leisure time on reading and implementing AI and machine learning concepts using statistical models.

Leave a Comment