In this tutorial, you will learn about what is data frame in R?, how to create data frame in R?, and how to access variable(s) and/or observation(s ) from a data frame?
What is a Data Frame in R?
In R language, a data frame is a primary data structure for handling tabular data sets like a spreadsheet. Data frames is an atomic data structure in R. Data frames are like matrices except that the columns are allowed to be of different types. That is data frames stores heterogeneous data types whereas matrix stores homogeneous data types.
Each row in the data frame corresponds to different observational units and each column in the data frame corresponds to different variables.

How to create data frame in R?
In R data frames can be created using the data.frame()
function. It converts collection of vectors or a matrix into a data frame.
Creating an empty data frame in R
Sometimes you need to initialize an empty data frame with only variable names and their storage type. The function data.frame()
is used to create an empty structure of a data frame.
Suppose you want to create an empty data frame with 5 variables Name
, Gender
, Age
, Weight
and Height
. Following R code create an empty data frame with these variable names.
data1 <- data.frame(
Name = character(),
Gender = character(),
Age = numeric(),
Weight = numeric(),
height = numeric()
)
str(data1)
'data.frame': 0 obs. of 5 variables:
$ Name : chr
$ Gender: chr
$ Age : num
$ Weight: num
$ height: num
Sometimes we need to create an empty data frame structure from an existing data frame. Following R code copy only the structure of data1
data frame to data2
and create an empty data frame data2
.
data2 <- data1[FALSE, ]
str(data2)
'data.frame': 0 obs. of 5 variables:
$ Name : chr
$ Gender: chr
$ Age : num
$ Weight: num
$ height: num
Creating a data frame using data.frame()
function
Suppose we have some data about the students as follows:
Name | Gender | Age | Weight |
---|---|---|---|
A | Male | 10 | 26 |
B | Female | 20 | 35 |
C | Female | 12 | 28 |
D | Male | 14 | 30 |
E | Male | 16 | 31 |
F | Female | 15 | 29 |
G | Male | 17 | 34 |
student <- data.frame(
name = c("A", "B", "C", "D", "E", "F", "G"),
gender = c('Male', 'Female', 'Female',
'Male', 'Male', 'Female', 'Male'),
age = c(10, 20, 12, 14, 16, 15, 17),
weight = c(26, 35, 28, 30, 31, 29, 34))
str(student)
'data.frame': 7 obs. of 4 variables:
$ name : chr "A" "B" "C" "D" ...
$ gender: chr "Male" "Female" "Female" "Male" ...
$ age : num 10 20 12 14 16 15 17
$ weight: num 26 35 28 30 31 29 34
Creating a data frame from vectors
Data frame can also be created from vectors. To construct a data frame from the above data, begin by constructing four vectors corresponding to each column of the data.
name <- c("A", "B", "C", "D", "E", "F", "G")
gender <- c("M", "F", "F", "M", "M", "F", "M")
age <- c(10, 20, 12, 14, 16, 15, 17)
weight <- c(26, 35, 28, 30, 31, 29, 34)
Use a data.frame()
function to combine all the four vectors into a single data frame entity.
The data.frame
function creates an object called student
and within that it stores values of the four variables name, gender, age and weight.
student <- data.frame(name, gender, age, weight)
class(student) # display class of a data frame
[1] "data.frame"
The function names()
display the name of each variable in a data frame.
The str()
function display status of each variable in a data frame.
# display the name of the variables from the data frame
names(student)
[1] "name" "gender" "age" "weight"
# Display the structure of a data frame
str(student)
'data.frame': 7 obs. of 4 variables:
$ name : chr "A" "B" "C" "D" ...
$ gender: chr "M" "F" "F" "M" ...
$ age : num 10 20 12 14 16 15 17
$ weight: num 26 35 28 30 31 29 34
Creating a data frame from list()
Create a list of students using all the four vectors defined above. Then create a data frame from a list using data.frame()
function.
# Make a list from a vector
student.list <- list(
name = name,
gender = gender,
age = age,
weight = weight
)
class(student.list)
[1] "list"
# make a data frame from list
student <- data.frame(student.list)
# display the structure of a data frame
str(student)
'data.frame': 7 obs. of 4 variables:
$ name : chr "A" "B" "C" "D" ...
$ gender: chr "M" "F" "F" "M" ...
$ age : num 10 20 12 14 16 15 17
$ weight: num 26 35 28 30 31 29 34
# display dimension of a data frame
dim(student)
[1] 7 4
attributes(student)
$names
[1] "name" "gender" "age" "weight"
$class
[1] "data.frame"
$row.names
[1] 1 2 3 4 5 6 7
Important functions for Data Frame
Some important functions related to data frame are as follows:
Function | Output |
---|---|
str(dataframe) |
Explore the data structure of a data frame |
class(dataframe) |
Display the class of a data frame |
dim(dataframe) |
Display the dimension of a data frame |
nrow(dataframe) |
Display number of rows in a data frame |
ncol(dataframe) |
Display number of columns in a data frame |
names(dataframe) |
Display the names of the variables of the data frame |
colnames(dataframe) |
Display name of columns of a data frame |
rownames(dataframe) |
Display name of rows of a data frame |
dimnames(dataframe) |
Display list with names of rows and columns |
is.data.frame(dataframe) |
Check whether the argument is data frame |
as.data.frame(x) |
Convert argument x to data frame |
attributes(dataframe) |
access attributes of data frame |
Before using data from a data frame, it is good practice to check the summary of the structure of data frame. To get the summary of the structure of a data frame, use the str()
function.
# display structure of a data frame
str(student)
'data.frame': 7 obs. of 4 variables:
$ name : chr "A" "B" "C" "D" ...
$ gender: chr "M" "F" "F" "M" ...
$ age : num 10 20 12 14 16 15 17
$ weight: num 26 35 28 30 31 29 34
If you apply the str()
function on a data frame, it will provides the following information:
- number of observations
- number of variables
- name of each variable
- mode (i.e., type of data) of each variable
- few observations for each of the variables
attributes(student)
$names
[1] "name" "gender" "age" "weight"
$class
[1] "data.frame"
$row.names
[1] 1 2 3 4 5 6 7
The top rows and bottom rows of a data frame can be displayed using head()
and tail()
function respectively. By default head()
and tail()
function display top 6 and bottom 6 rows of a data frame respectively.
# display top 6 rows of a data frame
head(student)
name gender age weight
1 A M 10 26
2 B F 20 35
3 C F 12 28
4 D M 14 30
5 E M 16 31
6 F F 15 29
# display top 3 rows of a data frame
head(student, 3)
name gender age weight
1 A M 10 26
2 B F 20 35
3 C F 12 28
# display bottom 6 rows of a data frame
tail(student)
name gender age weight
2 B F 20 35
3 C F 12 28
4 D M 14 30
5 E M 16 31
6 F F 15 29
7 G M 17 34
# display bottom 3 rows of a data frame
tail(student, 3)
name gender age weight
5 E M 16 31
6 F F 15 29
7 G M 17 34
Accessing Elements of a data frame
Accessing Rows/Columns using index
Elements of a data frame can be accessed by specifying row number(s) and/or column number(s). Like
df[i,]
returns $i^{th}$ row of data framedf
,df[,j]
returns $j^{th}$ column of data framedf
anddf[i,j]
returns $(i,j)^{th}$ element of data framedf
.
Consider a data frame student
defined above.
Below R code returns only first row of data frame student
.
# returns 1st row of data frame
student[1, ]
name gender age weight
1 A M 10 26
Below R code returns only third column of data frame student
.
# returns 3rd column of data frame
student[, 3]
[1] 10 20 12 14 16 15 17
Above R code returns third column of data frame student
but it return a vector, even though the object is a data frame. To prevent this from happening we use drop=FALSE
argument as follows:
# returns 3rd column of data frame
student[, 3, drop = FALSE]
age
1 10
2 20
3 12
4 14
5 16
6 15
7 17
Below R code return the element from second row and third column of data frame student
.
# returns value from 2nd row and 3rd column
# of data frame student
student[2, 3]
[1] 20
Below R code returns the 1 to 2 rows and third column of data frame student
.
# returns the elements from the first 2
# rows and 3rd column of student data frame
student[1:2, 3]
[1] 10 20
To extract non-adjacent rows or columns, use c()
(combine) function.
Below R code returns first and third row of data frame student
.
# returns the elements from first and
# third row of student data frame
student[c(1, 3), ]
name gender age weight
1 A M 10 26
3 C F 12 28
Negative indexing is used to omit the specific row(s) and/or column(s). Below R code display all the rows except second row of data frame student
.
# returns the elements from all
# rows except 2nd row
student[-2, ]
name gender age weight
1 A M 10 26
3 C F 12 28
4 D M 14 30
5 E M 16 31
6 F F 15 29
7 G M 17 34
Accessing variables of data.frame
Variables (columns) of data frame can also be accessed using column names.
Variables from data frame can be accessed using three different ways.
Using square bracket
Single variable can be retrieved using square bracket with column index of variable or variable name
# retrieve column no. 3 (age) of data frame
student[, 3]
[1] 10 20 12 14 16 15 17
# retrieve age column of data frame
student[, "age"]
[1] 10 20 12 14 16 15 17
# retrieve "age" variable of data frame
student["age"]
age
1 10
2 20
3 12
4 14
5 16
6 15
7 17
Using $
sign and the name of the variable
Any variable can be retrieved using the data frame name followed by $
symbol and the variable name.
# retrieve "age" column of data frame
student$age
[1] 10 20 12 14 16 15 17
Using double square bracket
Variable can also be retrieved with name of the variable
in double square bracket.
# retrieve age column of data frame
student[["age"]]
[1] 10 20 12 14 16 15 17
Using c()
function
More than one variables can be retrieved using concatenation c()
function
# retrieve name, age and weight column of data frame
student[, c("name", "age", "weight")]
name age weight
1 A 10 26
2 B 20 35
3 C 12 28
4 D 14 30
5 E 16 31
6 F 15 29
7 G 17 34
Accessing cases from data frame
Rows or observations from a data frame can be accessed by specifying row index or indexes in square bracket.
Selecting single row
Single row of a data frame can be accessed using row index in a square bracket.
# select 3rd row from data frame
student[3, ]
name gender age weight
3 C F 12 28
Selecting adjacent rows/cases
# select adjacent rows from data frame
student[1:3, ]
name gender age weight
1 A M 10 26
2 B F 20 35
3 C F 12 28
Selecting non-adjacent rows
Non-adjacent rows of a data frame can be selected using concatenation function c()
.
student[c(1, 3, 5), ]
name gender age weight
1 A M 10 26
3 C F 12 28
5 E M 16 31
Selecting random sample of rows
Use the sample(x,size)
function to select row index randomly.
# select 3 rows randomly from the student data frame
k <- sample(nrow(student), 3)
student[k, ]
name gender age weight
2 B F 20 35
1 A M 10 26
7 G M 17 34
Note that sample(x,size,replace=FALSE, prob=NULL)
function is used to select a random sample of specified size
from x
.
Conditional selection from a data frame
Many times we need to extract the data from a data frame that satisfies certain criteria.
For example, we need to extract data from student
data frame for Female
candidate only. In such a situation instead of indexing we can use relational expression.
# display data for only female students
student[student$gender == "F", ]
name gender age weight
2 B F 20 35
3 C F 12 28
6 F F 15 29
Suppose we need to extract data for Female
candidate with age > 12
from student
data frame.
# display data for female with age >12
student[student$gender == "F" & student$age > 12, ]
name gender age weight
2 B F 20 35
6 F F 15 29
Suppose we need to extract student data for which age > 12
and age <= 15
.
# display data for age between 12 and 15 (inclusive)
student[student$age > 12 & student$age <= 15, ]
name gender age weight
4 D M 14 30
6 F F 15 29
Extracting subset from data frame
Subsets from a data frame can be extracted using subset()
function. It returns subsets of a data frame which meet the specified condition.
# Display all columns for Female candidate only
subset(student, gender == "F")
name gender age weight
2 B F 20 35
3 C F 12 28
6 F F 15 29
# Display all columns for age < 14
subset(student, age < 14)
name gender age weight
1 A M 10 26
3 C F 12 28
# Display all columns for age < 14 and gender = F
subset(student, age < 14 & gender == "F")
name gender age weight
3 C F 12 28
Some specific variable can be selected or deselected Using select
argument in subset()
function.
# display only age and weight column for Female candidate.
subset(student, gender == "F", select = c(age, weight))
age weight
2 20 35
3 12 28
6 15 29
# display all columns except Age for Male candidate.
subset(student, gender == "M", select = -age)
name gender weight
1 A M 26
4 D M 30
5 E M 31
7 G M 34
Note that subset()
function in R is a kind of filtering a data frame that meet the specified condition.
Adding or removing column and rows to a data frame
Adding column using simple assignment
Suppose we need to add height
data to the existing student
data frame.
# create a new vector height
height <- c(155, 153, 165, 162, 158, 156, 168)
We can add height
column to a student
data frame using $
symbol as follows :
## Add height column to data frame
student$height <- height
student
name gender age weight height
1 A M 10 26 155
2 B F 20 35 153
3 C F 12 28 165
4 D M 14 30 162
5 E M 16 31 158
6 F F 15 29 156
7 G M 17 34 168
Adding Column to a data frame using cbind()
function
Column can also be added to existing data frame using cbind()
function as follows:
# create a new vector result
result <- c("Pass", "Fail", "Pass", "Fail", "Pass",
"Pass", "Pass")
student <- cbind(student, result)
student
name gender age weight height result
1 A M 10 26 155 Pass
2 B F 20 35 153 Fail
3 C F 12 28 165 Pass
4 D M 14 30 162 Fail
5 E M 16 31 158 Pass
6 F F 15 29 156 Pass
7 G M 17 34 168 Pass
Removing column from a data frame
Column can be removed from a data frame just by assigning NULL
to that column.
student$result <- NULL
student
name gender age weight height
1 A M 10 26 155
2 B F 20 35 153
3 C F 12 28 165
4 D M 14 30 162
5 E M 16 31 158
6 F F 15 29 156
7 G M 17 34 168
Adding row to a data frame using rbind()
function
new <- c("H", "M", 23, 40, 159)
student <- rbind(student, new)
student
name gender age weight height
1 A M 10 26 155
2 B F 20 35 153
3 C F 12 28 165
4 D M 14 30 162
5 E M 16 31 158
6 F F 15 29 156
7 G M 17 34 168
8 H M 23 40 159
Removing rows from a data frame
Rows from a data frame can be removed using negative index for rows or using concatenate function c()
as follows:
student <- student[-c(7,8),]
student
name gender age weight height
1 A M 10 26 155
2 B F 20 35 153
3 C F 12 28 165
4 D M 14 30 162
5 E M 16 31 158
6 F F 15 29 156
Endnote
In this tutorial you learned about what is data frame in R, how to create data frame in R and how to access elements of data frames using different methods.
Learn more about data structures in R refer to the following tutorials:
- Data Types in R
- Data Structures in R
- Variables and constants in R
- Vectors in R
- Matrix in R
- Arrays in R
- Lists in R
- Factors in R
Hope you enjoyed learning data frame in R. The content is more than sufficient to understand data frame in R and how to perform various operations on data frame in R.