Binomial distribution probabilities using R

Binomial distribution probabilities using R

In this tutorial, you will learn about how to use dbinom(), pbinom(), qbinom() and rbinom() functions in R programming language to compute the individual probabilities, cumulative probabilities, quantiles and how to generate random sample from Binomial distribution.

Before we discuss R functions for binomial distribution, let us see what is binomial distribution.

Binomial Distribution

Binomial distribution is typically used in situations where there are only two possible outcomes of a random experiment, such as success or failure, head or tail, profit or loss and the probability of success is constant from one trial to another trial. All the trials must be independent of each other.

Let $X\sim B(n,p)$ distribution. Then the probability mass function of binomial random variable $X$ is

$$ \begin{aligned} P(X=x) & = \binom{n}{x} p^x q^{n-x},\\ & \quad x = 0,1,2, \cdots, n; \\ & \quad 0 \leq p \leq 1, q = 1-p \end{aligned} $$

where there are two parameters, namely, $n$ : number of trials (size) and $p$ : probability of success (prob).

Read more about the theory and results of binomial distribution here.

Binomial probabilities using dbinom() function in R

For discrete probability distribution, density is the probability of getting exactly the value $x$ (i.e., $P(X=x)$).

The syntax to compute the probability at $x$ for binomial distribution using R is

dbinom(x,size,prob)

where

  • x : the value(s) of the variable,
  • size : the number of trials, and
  • prob : the probability of success (prob).

The dbinom() function gives the probability for givn value(s) x (no. of successes), size (no. of trials) and prob (probability of success).

Numerical Problem for Binomial Distribution

To understand the four functions dbinom(), pbinom(), qbinom() and rbinom(), let us take the following numerical problem.

Binomial Distribution Example

In a university 45% of the students are female. A random sample of ten students are selected.

(a) Find the probability that exactly four female students are selected.
(b) Plot the graph of binomial probability distribution.
(c) What is the probability that 2 or less female students are selected?
(d) What is the probability more than 8 female students are selected?
(e) What is the probability that 4 to 6 (inclusive) female students are selected?
(f) Plot the graph of cumulative binomial probabilities.
(g) What is the value of $c$, if $P(X\leq c) \geq 0.70$?
(h) Simulate 100 binomial distributed random variables with $n=10$ and $p=0.45$.

Example 1: How to use dbinom() function in R?

To find the probability that exactly four female students are selected, we need to use dbinom() function.

Let $X$ denote the number of female students out of randomly selected $10$ students and the probability of female students (success) is $0.45$. Then $X\sim B(10, 0.45)$.

First let us define the given terms as

# assign no. of trials or size
size <- 10
# assign probability of success or prob
prob <- 0.45
# x is the possible values of random variable x
x <- 0:size 

The probability mass function of $X$ is

$$ \begin{aligned} P(X=x) &= \binom{10}{x} (0.45)^x (1-0.45)^{10-x},\\ &\quad x=0,1,\cdots, 10 \end{aligned} $$

For part (a), we need to find the probability $P(X = 4)$.

First I will show you how to calculate this probability using manual calculation, then I will show you how to compute the same probability using dbinom() function in R.

(a) The probability that the sample contains exactly four female students is

$$ \begin{aligned} P(X= 4) & =\binom{10}{4} (0.45)^{4} (1-0.45)^{10-4}\\ & = 0.2383666\\ \end{aligned} $$

The above probability can be calculated using dbinom(4,10,0.45) function in R.

# Compute binomial probability
result1 <- dbinom(4,size,prob)
result1
[1] 0.2383666

Example 2 Visualize Binomial probability distribution

Using dbinom() function we can compute Binomial distribution probabilities and make a table of it.

## Compute the binomial probabilities 
px<-dbinom(x,size,prob)
# make a table 
b_table<-cbind(x,px)
# specify the column names
colnames(b_table)<-c("x", "P(X=x)")
b_table
       x       P(X=x)
 [1,]  0 0.0025329516
 [2,]  1 0.0207241496
 [3,]  2 0.0763025509
 [4,]  3 0.1664782929
 [5,]  4 0.2383666466
 [6,]  5 0.2340327076
 [7,]  6 0.1595677552
 [8,]  7 0.0746031063
 [9,]  8 0.0228895894
[10,]  9 0.0041617435
[11,] 10 0.0003405063

Using kable() function from knitr package, we can create table in LaTeX, HTML, Markdown and reStructured Text.

# to make table
library(knitr)
kable(b_table,align="c")
x P(X=x)
0 0.0025330
1 0.0207241
2 0.0763026
3 0.1664783
4 0.2383666
5 0.2340327
6 0.1595678
7 0.0746031
8 0.0228896
9 0.0041617
10 0.0003405

(b) Visualizing Binomial Distribution with dbinom() function and plot() function in R:

The probability mass function of binomial distribution with given size and given prob can be visualized using dbinom() function in plot() function as follows:

## Plot the binomial probability dist
plot(x,px,type="h",xlim=c(0,11),ylim=c(0,max(px)),
     lwd=10, col="blue",ylab="P(X=x)")
title("PMF of Binomial (size=10,prob=0.45)")
PMF of Binomial Distribution
PMF of Binomial Distribution

Binomial cumulative probability using pbinom() function in R

The syntax to compute the cumulative probability distribution function (CDF) for binomial distribution using R is

pbinom(q,size,prob)

where

  • q : the value(s) of the variable,
  • size : the number of trials, and
  • prob : the probability of success (prob).

This function is very useful for calculating the cumulative binomial probabilities for given value(s) of q (value of the variable x), size (no. of trials) and prob (probability of success).

Example 3: How to use pbinom() function in R?

In the above example, for part (c), we need to find the probability $P(X\leq 2)$.

First I will show you how to calculate this probability using manual calculation, then I will show you how to compute the same probability using pbinom() and dbinom() function in R.

(c) The probability that 2 or less female students are selected equals

$$ \begin{aligned} P(X \leq 2) &= P(X=0)+P(X=1)+P(X=2)\\ &= \binom{10}{0} (0.45)^{0} (1-0.45)^{10-0} +\binom{10}{1} (0.45)^{1} (1-0.45)^{10-1} \\ &\quad +\binom{10}{2} (0.45)^{2} (1-0.45)^{10-2}\\ &= 0.002533+0.0207241+0.0763026\\ &= 0.0995597 \end{aligned} $$

## Compute cumulative binomial probability
result2 <- pbinom(2,size,prob)
result2
[1] 0.09955965

Above probability can also be calculated using dbinom() function and the sum() function as follows:

sum(dbinom(0:2,size,prob))
[1] 0.09955965

Example 4: How to use pbinom() function in R?

In the above example, for part (d), we need to find the probability $P(X > 8)$.

Numerically the probability that the selected sample contains more than 8 females can be calculated as

$$ \begin{aligned} P(X > 8) & =P(X\geq 9)\\ &=\sum_{x=9}^{10} P(X=x)\\ & = P(X=9)+P(X=10)\\ &= 0.0041617+3.4050629\times 10^{-4}\\ & = 0.0045022\\ \end{aligned} $$

To calculate the probability that a random variable $X$ is greater than a given number you can use the option lower.tail=FALSE in pbinom() function.

Above probability can be calculated easily using pbinom() function with argument lower.tail=FALSE as

$P(X > 8) =$ pbinom(8,size,prob,lower.tail=FALSE)

or by using complementary event as

$P(X > 8) = 1- P(X\leq 8)$= 1- pbinom(8,size,prob)

# compute cumulative binomial probabilities
# with lower.tail False
pbinom(8,size,prob,lower.tail=FALSE)
[1] 0.00450225
1-pbinom(8,size,prob)
[1] 0.00450225

Example 5: How to use pbinom() function in R?

One can also use pbinom() function to calculate the probability that the random variable $X$ is between two values.

(e) The probability that the sample contains 4 to 6 (inclusive) female students is

$$ \begin{aligned} P(4 \leq X \leq 6) &= P(X=4)+P(X=5)+P(X=6)\\ &= \binom{10}{4} (0.45)^{4} (1-0.45)^{10-4} +\binom{10}{5} (0.45)^{5} (1-0.45)^{10-5} \\ &\quad +\binom{10}{6} (0.45)^{6} (1-0.45)^{10-6}\\ &= 0.2383666+0.2340327+0.1595678\\ &= 0.6319671 \end{aligned} $$

Above event can also be written as

$$ \begin{aligned} P(4 \leq X \leq 6) &= P(X\leq 6) -P(X\leq 3)\\ &= 0.8980051 - 0.2660379 \end{aligned} $$

The above probability can be calculated using pbinom() function as follows:

result3 <- pbinom(6,size,prob)-pbinom(3,size,prob)
result3
[1] 0.6319671

The above probability can also be calculated using dbinom() function along with sum() function.

result4 <- sum(dbinom(4:6,size,prob))
result4
[1] 0.6319671

The first command compute the binomial probability for $x=4$, $x=5$ and $x=6$. Then add all the probabilities using sum() function and store the result in result4.

Example 6: Visualize the cumulative binomial probability distribution

# the value of x
x <- 0:size
# Compute cumulative binomial probabilities
Fx <- pbinom(x,size,prob)
## Compute the binomial probabilities 
px <- dbinom(x,size,prob)
## Compute the cumulative binomial probabilities 
Fx <- pbinom(x,size,prob)
## make a table 
b_table2 <- cbind(x,px,Fx)
## assign column names
colnames(b_table2) <- c("x", "P(X=x)","P(X<=x)")
# display result
b_table2
       x       P(X=x)     P(X<=x)
 [1,]  0 0.0025329516 0.002532952
 [2,]  1 0.0207241496 0.023257101
 [3,]  2 0.0763025509 0.099559652
 [4,]  3 0.1664782929 0.266037945
 [5,]  4 0.2383666466 0.504404592
 [6,]  5 0.2340327076 0.738437299
 [7,]  6 0.1595677552 0.898005054
 [8,]  7 0.0746031063 0.972608161
 [9,]  8 0.0228895894 0.995497750
[10,]  9 0.0041617435 0.999659494
[11,] 10 0.0003405063 1.000000000
kable(b_table2,align="c")
x P(X=x) P(X<=x)
0 0.0025330 0.0025330
1 0.0207241 0.0232571
2 0.0763026 0.0995597
3 0.1664783 0.2660379
4 0.2383666 0.5044046
5 0.2340327 0.7384373
6 0.1595678 0.8980051
7 0.0746031 0.9726082
8 0.0228896 0.9954978
9 0.0041617 0.9996595
10 0.0003405 1.0000000

The cumulative probability distribution of binomial distribution with given size and given prob can be visualized using plot() function with argument type="s" (step function) as follows:

# Plot the cumulative binomial dist
plot(x,Fx,type="s",lwd=2,col="blue",
     ylab=expression(P(X<=x)),
main="Distribution Function of B(n=10,p=0.45)")
CDF of Binomial Distribution
CDF of Binomial Distribution

Binomial Distribution Quantiles using qbinom() in R

The syntax to compute the quantiles of binomial distribution using R is

qbinom(p,size,prob)

where

  • p : the value(s) of the probabilities,
  • size : the number of trials, and
  • prob : the probability of success (prob).

The function qbinom(p,size,prob) gives $100*p^{th}$ quantile of Binomial distribution for given value of p, size and prob.

The $p^{th}$ quantile is the smallest value of binomial random variable $X$ such that $P(X\leq x) \geq p$.

It is the inverse of pbinom() function. That is, inverse cumulative probability distribution function for binomial distribution.

Example 7: How to use qbinom() function in R?

In part (g), we need to find the value of $c$ such a that $P(X\leq c) \geq 0.70$. That is we need to find the $70^{th}$ quantile of given binomial distribution.

size <- 10
prob <- 0.45
# compute the quantile for binomial dist
qbinom(0.70,size,prob)
[1] 5

From the above table of Binomial probabilities and cumulative probabilities, it is clear that $70^{th}$ percentile is 5.

Visualize the quantiles of Binomial Distribution

The quantiles of Binomial distribution with given p, size and prob can be visualized using plot() function as follows:

p <- seq(0,1,by=0.02)
qx <- qbinom(p,size=size,prob=prob)
# Plot the quantiles of Binomial dist
plot(p,qx,type="s",lwd=2,col="darkred",
     ylab="quantiles",
main="Quantiles of B(size=10,prob=0.45)")
Quantiles of Binomial Distribution
Quantiles of Binomial Distribution

Simulating Binomial random variable using rbinom() function in R

The general R function to generate random numbers from Binomial distribution is rbinom(n,size,prob),

where,

  • n is the sample size,
  • size is the number of trials, and
  • prob is the the probability of success in binomial distribution.

The function rbinom(n,size,prob) generates n random numbers from Binomial distribution with the number of trials size and the probability of success prob.

Example 8: How to use rbinom() function in R?

In part (h), we need to generate 100 random numbers from binomial distribution with number of trials (size) =10 and probability of success (prob) =0.45.

We can use rbinom() function to generate random numbers from binomial distribution.

## initialize sample size to generate
n <- 100
# Simulate 100 values From Binomial dist
x_sim <- rbinom(n,size,prob)
# print values at console
x_sim  
  [1] 4 6 4 6 7 2 5 6 5 4 7 4 5 5 3 7 3 2 4 7 6 5 5 8 5 5 5 5 4 3 7 7 5 6 2 4 6
 [38] 3 4 3 3 4 4 4 3 3 3 4 3 6 2 4 6 3 5 3 3 6 6 4 5 2 4 4 6 4 6 6 6 4 6 5 5 0
 [75] 4 3 4 5 4 3 3 5 4 6 3 4 8 6 6 3 3 5 4 5 4 3 6 2 4 5

The frequency table for Binomial simulated data x_sim can be obtained using table() command.

## Print the frequency table
table(x_sim)
x_sim
 0  2  3  4  5  6  7  8 
 1  6 20 26 20 19  6  2 
plot(table(x_sim),xlab="x",ylab="frequency",
     lwd=10,col="magenta",
     main="Simulated data from B(10,0.45) dist")
Random Sample from Binomial
Random Sample from Binomial

If you use same function again, R will generate another set of random numbers from $B(10,0.45)$.

# Simulate 100 values From Binomial dist
x_sim_2 <- rbinom(n,size,prob)
# print values at console
x_sim_2
  [1] 5 6 2 6 2 4 3 4 4 4 3 5 5 4 6 7 4 3 5 5 4 3 3 4 6 4 4 7 2 4 4 5 2 6 3 1 6
 [38] 4 2 5 4 4 3 6 3 1 5 4 3 6 4 5 2 5 7 3 6 2 5 8 6 2 6 4 4 4 6 8 3 2 5 6 5 6
 [75] 7 5 5 4 3 3 4 3 5 4 4 6 3 5 3 3 5 5 5 6 4 7 7 4 3 3

The frequency table of simulated data from Binomial distribution is as follow:

## Print the frequency table
table(x_sim_2)
x_sim_2
 1  2  3  4  5  6  7  8 
 2  9 19 26 20 16  6  2 
plot(table(x_sim_2),xlab="x",ylab="frequency",
     lwd=10,col="magenta",
     main="Simulated data from B(10,0.45) dist")
Random Sample from Binomial 2
Random Sample from Binomial 2

For the simulation purpose to reproduce same set of random numbers, one can use set.seed() function.

# set seed for reproducibility
set.seed(1457)
# Simulate 100 values From Binomial dist
x_sim_3 <- rbinom(n,size,prob)
# print values at console
x_sim_3
  [1] 5 5 5 3 5 3 5 4 6 3 5 3 3 1 4 4 4 5 2 5 2 4 4 5 4 3 5 7 3 4 7 3 3 4 3 5 6
 [38] 3 5 4 6 5 3 3 3 3 5 7 3 4 3 4 7 4 7 3 4 1 7 5 4 4 4 4 4 6 7 6 5 4 6 2 3 5
 [75] 3 6 3 8 4 2 4 1 4 3 5 5 5 6 2 5 5 5 5 3 3 5 3 6 4 5

The frequency table of x_sim_3 is as follows:

# frequency table using tabel command
table(x_sim_3)
x_sim_3
 1  2  3  4  5  6  7  8 
 3  5 25 24 26  9  7  1 
plot(table(x_sim_3),xlab="x",ylab="frequency",
     lwd=10,col="darkred",
     main="Simulated data from B(10,0.45) dist")
Random Sample from Binomial 3
Random Sample from Binomial 3
set.seed(1457)
# Simulate 100 values From Binomial dist
x_sim_4 <- rbinom(n,size,prob)
# print values at console
x_sim_4
  [1] 5 5 5 3 5 3 5 4 6 3 5 3 3 1 4 4 4 5 2 5 2 4 4 5 4 3 5 7 3 4 7 3 3 4 3 5 6
 [38] 3 5 4 6 5 3 3 3 3 5 7 3 4 3 4 7 4 7 3 4 1 7 5 4 4 4 4 4 6 7 6 5 4 6 2 3 5
 [75] 3 6 3 8 4 2 4 1 4 3 5 5 5 6 2 5 5 5 5 3 3 5 3 6 4 5

The frequency table of x_sim_4 is as follows:

# frequency table using table 
table(x_sim_4)
x_sim_4
 1  2  3  4  5  6  7  8 
 3  5 25 24 26  9  7  1 
plot(table(x_sim_4),xlab="x",ylab="frequency",
     lwd=10,col="darkred",
     main="Simulated data from B(10,0.45) dist")
Random Sample from Binomial 4
Random Sample from Binomial 4

Since we have used set.seed(1457) function for both the simulation, the x_sim_3 and x_sim_4 are same.

To learn more about other discrete and continuous probability distributions using R, go through the following tutorials:

Poisson distribution in R
Geometric distribution in R
Negative Binomial distribution in R
Hypergeometric distribution in R

Continuous Distributions Using R

Uniform distribution in R
Exponential distribution in R
Normal distribution in R
Log-Normal distribution in R
Beta distribution in R
Gamma distribution in R
Cauchy distribution in R
Laplace distribution in R
Logistic distribution in R
Weibull distribution in R

Endnote

In this tutorial, you learned about how to compute the probabilities, cumulative probabilities and quantiles of Binomial distribution in R programming. You also learned about how to simulate a binomial distribution using R programming.

To learn more about R code for discrete and continuous probability distributions, please refer to the following tutorials:

Probability Distributions using R

Let me know in the comments below, if you have any questions on Binomial Distribution using R and your thought on this article.

Leave a Comment