Built-in Character Functions in R

Built-in Character Functions in R

In earlier tutorials we have seen that how to create character vector in R which contains character strings. To manipulate strings or character vectors, R has many built in functions for characters.

Function Description
nchar() Get the length of string
toupper(x) Convert to Upper case
tolower(x) Convert to Lower case
casefold() case folding
chartr() Character translation in character vector
substr(x,start=n1,stop=n2) Extract or replace sub-strings in a character vector
strsplit() Split the character vector
strrep() Repeat the character string
paste() Concatenate vectors after converting to character

Examples of Character Functions

Let us discuss how to use all the above built-in character functions in R with the help of examples.

String Length in R

The number of characters (including space) in a string or elements of character vector can be counted using nchar() function.

# create a string
x <- "R Programming"
# count the number of characters
nchar(x)
[1] 13
# count the number of elements
length(x)
[1] 1
# create a character vector
y <-c("One","Two","Three","Four","Five")
# count no. of letters in each element of y
nchar(y)
[1] 3 3 5 4 4
# count the number of elements in y
length(y)
[1] 5

Note than the length() function gives the number of elements in a vector and nchar() function gives the number of characters in each element of a vector.

toupper() function in R

Many times, during programming we need to change the case of a string or character. The toupper() function converts the letters in a given string to uppercase.

x <- "r programming"
toupper(x)
[1] "R PROGRAMMING"

tolower() function in R

The tolower() function converts the letters in a given string to lowercase.

y <- 'R Programming'
tolower(y)
[1] "r programming"

casefold() function in R

By default the casefold() function converts all the characters to lower case. But we can use the argument upper=TRUE to convert all the characters to upper case.

casefold("R is The bEst ProGramminG LanGuagE")
[1] "r is the best programming language"
casefold("R is The bEst ProGramminG LanGuagE", upper=TRUE)
[1] "R IS THE BEST PROGRAMMING LANGUAGE"

chartr() function in R

The chartr(old,new,x) function is used to translate the old characters to new characters in character vectors x

Suppose we need to translate the letter "r" with "R" in the sentence "r Language".

x <- "r Language"
chartr("r","R",x)
[1] "R Language"

The chartr() can also be used for multiple replacement.
Suppose we have x as R Programming Language and we need to translate all the character from the range m to p (i.e., m, n, o, p) in x to M to P (i.e., M, N, O, P) and g to G.

x <- "R Programming Language"
chartr("m-pg","M-PG", x)
[1] "R PrOGraMMiNG LaNGuaGe"

substr() function in R

The substr() function is used to extract or replace substrings in a character vector.

x <-  "R Programming"
substr(x,3,9) # extract 
[1] "Program"

In the above example, R will extract a string from $3^{rd}$ letter to $9^{th}$ letter form x.

# Replace 3rd to 5th character by abc
substr(x,3,5)<-"abc" 
x
[1] "R abcgramming"

In the above example, R will replace $3^{rd}$ to $5^{th}$ character by the string abc.

strsplit() function in R

The strsplit(x, " ") function

x <-  c("R Programming","Python Programming")
strsplit(x," ")
[[1]]
[1] "R"           "Programming"

[[2]]
[1] "Python"      "Programming"

strrep() function in R

The strrep(x,times) function repeat the character string in a character vector a given number of times.

strrep("ABC",4)
[1] "ABCABCABCABC"

Above command create a character string in which the string "ABC" is repeated four times.

strrep(c("X","Y","Z"),1:4)
[1] "X"    "YY"   "ZZZ"  "XXXX"

Above command create a vector containing the elements "X", "YY","ZZ" and "XXXX". Since the string contains less number of elements that the number of times, R will use recycling rule.

strrep("  ",1:3)
[1] "  "     "    "   "      "

Above function create vector with the given number of spaces. First element contain two spaces, second element contains four spaces and third element contains six spaces.

paste() function in R

The basic syntax of paste() function is

paste(..., sep=" ",collapse=NULL)

One of the most important function that can be used to create and build strings is the paste() function.

The paste() function takes one or more R objects and convert them to a character. After that it concatenates these characters to create one or more character string.

D <- c("R", "Python")
paste("Best programming language for data science is ",D)
[1] "Best programming language for data science is  R"     
[2] "Best programming language for data science is  Python"
paste("Treatment",1:3,sep="-")
[1] "Treatment-1" "Treatment-2" "Treatment-3"

Note than if the objects are of different length in paste function, R apply the recycling rule.

paste("Block",1:4,sep=" ")
[1] "Block 1" "Block 2" "Block 3" "Block 4"

Endnote

In this tutorial you learned anout some important character built-in functions available in R with illustration.

To learn more about other built-in functions and user-defined functions in R, please refer to the following tutorials:

Hope you enjoyed learning this tutorial on built-in character functions in R.

Leave a Comment