# Geometric Distribution Probabilities Using R

## Geometric distribution probabilities using R

In this tutorial, you will learn about how to use dgeom(), pgeom(), qgeom() and rgeom() functions in R programming language to compute the individual probabilities, cumulative probabilities, quantiles and to generate random sample for Geometric distribution.

Before we discuss R functions for Geometric distribution, let us see what is Geometric distribution.

## Geometric Distribution

Geometric distribution is used to model the situation where we are interested in finding the probability of number failures before first success or number of trials (attempts) to get first success in a repeated mutually independent Beronulli's trials, each with probability of success p

Let $X\sim G(p)$. Then the probability distribution of $X$ is

 \begin{aligned} P(X=x)= \left\{ \begin{array}{ll} pq^x , & \hbox{x=0,1,2,\cdots;} \\ & \hbox{0 < p < 1,\; q=1-p;} \\ 0, & \hbox{Otherwise.} \end{array} \right. \end{aligned}

where $p$ is the parameter of Geometric distribution.

## Geometric probabilities using dgeom() function in R

For discrete probability distribution, density is the probability of getting exactly the value $x$ (i.e., $P(X=x)$).

The syntax to compute the probability at $x$ for Geometric distribution using R is

dgeom(x,prob)

where

• x : the value(s) of the variable and,
• prob : the probability of success in each trial.

The dgeom() function gives the probability for given value(s) x and prob.

## Numerical Problem for Geometric Distribution

To understand the four functions dgeom(), pgeom(), qgeom() and rgeom(), let us take the following numerical problem.

### Geometric Distribution Example

If a production line has a 4.5 % defective rate. Let $X$ denote the number of non-defective products before first defective product.

(a) Find the the probability that the there will be 3 non-defective products before first defective.
(b) Plot the graph of Geometric probability distribution.
(c) Find the probability that there will be at most 3 non-defective products before first defective.
(d) Find the probability that there will be at least 3 non-defective products before first defective.
(e) What is the probability that 3 to 5 (inclusive) non-defective products before first defective product?
(f) Plot the graph of cumulative Geometric probabilities.
(g) What is the value of $c$, if $P(X\leq c) \geq 0.60$?
(h) Simulate 100 Geometric distributed random variables with $prob = 0.35$.

Let $X$ denote the number of non-defective products before first defective product. Let us consider non-defective product as success and defective product as failure. Then $p=P(\text{ success })=0.35$. Then $X\sim G(0.35)$.

### Example 1: How to use dgeom() function in R?

To find the probability that exactly three non-defective products before first defective product, we need to use dgeom() function.

First let us define the given terms as

# probability of success/defective
prob <- 0.35

The probability mass function of $X$ is

 \begin{aligned} P(X=x) &= 0.35(0.65)^x,\\ & \quad x=0,1,2,\cdots \end{aligned}

For part (a), we need to find the probability $P(X = 3)$.

First I will show you how to calculate this probability using manual calculation, then I will show you how to compute the same probability using dgeom() function in R.

(a) The probability that the there will be 3 non-defective products before first defective is

 \begin{aligned} P(X = 3) & =0.35(0.65)^3\\ & = 0.0961188\\ \end{aligned}

The above probability can be calculated using dgeom(3,0.35) function in R.

# Compute Geometric probability
result1 <- dgeom(3,prob)
result1
[1] 0.09611875

### Example 2 Visualize Geometric probability distribution

Using dgeom() function we can compute Geometric distribution probabilities and make a table of it.

# x is the possible values of random variable x
x <- 0:12
## Compute the Geometric probabilities
px<-dgeom(x,prob)
# make a table
b_table <- cbind(x,px)
# specify the column names
colnames(b_table) <- c("x", "P(X=x)")
b_table
       x      P(X=x)
[1,]  0 0.350000000
[2,]  1 0.227500000
[3,]  2 0.147875000
[4,]  3 0.096118750
[5,]  4 0.062477188
[6,]  5 0.040610172
[7,]  6 0.026396612
[8,]  7 0.017157798
[9,]  8 0.011152568
[10,]  9 0.007249169
[11,] 10 0.004711960
[12,] 11 0.003062774
[13,] 12 0.001990803

Using kable() function from knitr package, we can create table in LaTeX, HTML, Markdown and reStructured Text.

# to make table
library(knitr)
kable(b_table)
x P(X=x)
0 0.3500000
1 0.2275000
2 0.1478750
3 0.0961188
4 0.0624772
5 0.0406102
6 0.0263966
7 0.0171578
8 0.0111526
9 0.0072492
10 0.0047120
11 0.0030628
12 0.0019908

(b) Visualizing Geometric Distribution with dgeom() function and plot() function in R:

The probability mass function of Geometric distribution with given prob can be visualized using dgeom() function in plot() function as follows:

## Plot the Geometric probability dist
plot(x,px,type="h",xlim=c(0,12),ylim=c(0,max(px)),
lwd=10, col="blue",ylab="P(X=x)")
title("PMF of Geometric (prob= 0.35)")

## Geometric cumulative probability using pgeom() function in R

The syntax to compute the cumulative probability distribution function (CDF) for Geometric distribution using R is

pgeom(q,prob)

where

• q : the value(s) of the variable,
• prob : the probability of success in each trial.

This function is very useful for calculating the cumulative Geometric probabilities for given value(s) of q (value of the variable x), prob.

### Example 3: How to use pgeom() function in R?

In the above example, for part (c), we need to find the probability $P(X\leq 3)$.

First I will show you how to calculate this probability using manual calculation, then I will show you how to compute the same probability using pgeom() and dgeom() function in R.

(c) The probability that there will be at most 3 non-defective products before first defective is

 \begin{aligned} P(X\leq 3) &= P(X=0)+ P(X=1)+P(X=2)+P(X=3)\\ &= 0.35+ 0.35(0.65)^1\\ & \quad +0.35(0.65)^2+0.35(0.65)^3\\ &= 0.35+0.2275\\ & \quad +0.147875+0.0961188\\ &= 0.8214937 \end{aligned}

## Compute cumulative Geometric probability
result2 <- pgeom(3,prob)
result2
[1] 0.8214937

Above probability can also be calculated using dgeom() function and the sum() function as follows:

sum(dgeom(0:3,prob))
[1] 0.8214937

### Example 4: How to use pgeom() function in R?

In the above example, for part (d), we need to find the probability $P(X \geq 3)$.

Numerically the probability that there will be at least 3 non-defective products before first defective can be calculated as

 \begin{aligned} P(X \geq 3) & =1-P(X\leq 2)\\ &=1-\sum_{x=0}^{2} P(X=x)\\ & = 1- \big(P(X=0)+P(X=1)+P(X=2)\big)\\ &= 1- \big(0.35+0.2275\\ &\quad +0.147875\big)\\ & = 0.274625\\ \end{aligned}

To calculate the probability that a random variable $X$ is greater than a given number you can use the option lower.tail=FALSE in pgeom() function.

Above probability can be calculated easily using pgeom() function with argument lower.tail=FALSE as

$P(X \geq 3) =$ pgeom(2,prob,lower.tail=FALSE)

or by using complementary event as

$P(X \geq 3) = 1- P(X\leq 2)$= 1- pgeom(2,prob)

# compute cumulative Geometric probabilities
# with lower.tail False
pgeom(2,prob,lower.tail=FALSE)
[1] 0.274625
1-pgeom(2,prob)
[1] 0.274625

### Example 5: How to use pgeom() function in R?

One can also use pgeom() function to calculate the probability that the random variable $X$ is between two values.

(e) Tthe probability that 3 to 5 (inclusive) non-defective products before first defective product is

 \begin{aligned} P(3 \leq X \leq 5) &= P(X=3)+P(X=4)+P(X=5)\\ &= 0.35(0.65)^3+0.35(0.65)^4 + 0.35(0.65)^5\\ &= 0.0961188+0.0624772+0.0406102\\ &= 0.1992061 \end{aligned}

Above event can also be written as

 \begin{aligned} P(3 \leq X \leq 5) &= P(X\leq 5) -P(X\leq 2)\\ &= 0.9245811 - 0.725375\\ &=0.1992061 \end{aligned}

The above probability can be calculated using pgeom() function as follows:

result3 <- pgeom(5,prob)-pgeom(2,prob)
result3
[1] 0.1992061

The above probability can also be calculated using dgeom() function along with sum() function.

result4 <- sum(dgeom(3:5,prob))
result4
[1] 0.1992061

The first command compute the Geometric probability for $x=3$, $x=4$ and $x=5$. Then add all the probabilities using sum() function and store the result in result4.

### Example 6: Visualize the cumulative Geometric probability distribution

# the value of x
x <- 0:12
# Compute cumulative Geometric probabilities
Fx <- pgeom(x,prob)
## Compute the Geometric probabilities
px <- dgeom(x,prob)
## Compute the cumulative Geometric probabilities
Fx <- pgeom(x,prob)
## make a table
b_table2 <- cbind(x,px,Fx)
## assign column names
colnames(b_table2) <- c("x", "P(X=x)","P(X<=x)")
# display result
b_table2
       x      P(X=x)   P(X<=x)
[1,]  0 0.350000000 0.3500000
[2,]  1 0.227500000 0.5775000
[3,]  2 0.147875000 0.7253750
[4,]  3 0.096118750 0.8214937
[5,]  4 0.062477188 0.8839709
[6,]  5 0.040610172 0.9245811
[7,]  6 0.026396612 0.9509777
[8,]  7 0.017157798 0.9681355
[9,]  8 0.011152568 0.9792881
[10,]  9 0.007249169 0.9865373
[11,] 10 0.004711960 0.9912492
[12,] 11 0.003062774 0.9943120
[13,] 12 0.001990803 0.9963028
kable(b_table2)
x P(X=x) P(X<=x)
0 0.3500000 0.3500000
1 0.2275000 0.5775000
2 0.1478750 0.7253750
3 0.0961188 0.8214937
4 0.0624772 0.8839709
5 0.0406102 0.9245811
6 0.0263966 0.9509777
7 0.0171578 0.9681355
8 0.0111526 0.9792881
9 0.0072492 0.9865373
10 0.0047120 0.9912492
11 0.0030628 0.9943120
12 0.0019908 0.9963028

The cumulative probability distribution of Geometric distribution with given prob can be visualized using plot() function with argument type="s" (step function) as follows:

# Plot the cumulative Geometric dist
plot(x,Fx,type="s",lwd=2,col="blue",
ylab=expression(P(X<=x)),
main="Distribution Function of G(0.35)")

## Geometric Distribution Quantiles using qgeom() in R

The syntax to compute the quantiles of Geometric distribution using R is

qgeom(p,prob)

where

• p : the value(s) of the probabilities,
• prob : the probability of success in each trial.

The function qgeom(p,prob) gives $100*p^{th}$ quantile of Geometric distribution for given value of p and prob.

The $p^{th}$ quantile is the smallest value of Geometric random variable $X$ such that $P(X\leq x) \geq p$.

It is the inverse of pgeom() function. That is, inverse cumulative probability distribution function for Geometric distribution.

### Example 7: How to use qgeom() function in R?

In part (g), we need to find the value of $c$ such a that $P(X\leq c) \geq 0.60$. That is we need to find the $60^{th}$ quantile of given Geometric distribution.

prob <- 0.35
# compute the quantile for Geometric dist
qgeom(0.60,prob)
[1] 2

From the above table of Geometric probabilities and cumulative probabilities, it is clear that $60^{th}$ percentile is 2.

### Visualize the quantiles of Geometric Distribution

The quantiles of Geometric distribution with given p, size and prob can be visualized using plot() function as follows:

p <- seq(0,1,by=0.02)
qx <- qgeom(p,prob=prob)
# Plot the quantiles of Geometric dist
plot(p,qx,type="s",lwd=2,col="darkred",
ylab="quantiles",
main="Quantiles of Geo(0.35)")

## Simulating Geometric random variable using rgeom() function in R

The general R function to generate random numbers from Geometric distribution is

rgeom(n,prob)

where,

• n is the sample size,
• prob : the probability of success in each trial.

The function rgeom(n,prob) generates n random numbers from Geometric distribution with the probability of success prob.

### Example 8: How to use rgeom() function in R?

In part (h), we need to generate 100 random numbers from Geometric distribution with probability of success $0.35$.

We can use rgeom() function to generate random numbers from Geometric distribution.

## initialize sample size to generate
n <- 100
# Simulate 100 values From Geometric dist
x_sim <- rgeom(n,prob)
# print values at console
x_sim  
  [1] 2 4 0 0 1 8 5 1 1 0 2 7 0 1 2 2 1 5 5 0 0 4 0 1 1 2 1 1 0 2 0 1 4 1 0 0 5
[38] 4 0 2 2 0 6 0 0 1 1 1 2 1 1 1 4 0 5 1 2 1 3 2 2 0 3 5 1 0 0 1 2 2 3 2 2 0
[75] 0 0 0 3 0 0 0 4 1 0 2 0 0 3 0 1 0 0 0 0 0 0 3 4 2 0

The frequency table for Geometric simulated data x_sim can be obtained using table() command.

## Print the frequency table
table(x_sim)
x_sim
0  1  2  3  4  5  6  7  8
37 23 18  6  7  6  1  1  1 
## Plot the simulated data
plot(table(x_sim),xlab="x",ylab="frequency",
lwd=10,col="gray",
main="Simulated data from Geo(0.45) dist")

If you use same function again, R will generate another set of random numbers from $Geo(0.35)$.

# Simulate 100 values From Geometric dist
x_sim_2 <- rgeom(n,prob)
# print values at console
x_sim_2
  [1] 0 1 0 1 5 0 5 1 3 0 0 0 0 0 1 5 0 0 1 2 1 5 0 0 0 0 0 1 0 2 1 0 1 0 0 0 0
[38] 4 1 2 3 5 2 0 0 0 0 0 2 0 1 2 0 3 5 4 0 0 0 3 0 6 0 6 2 2 2 4 1 1 2 0 5 0
[75] 4 3 0 0 1 7 3 4 1 0 3 4 1 0 1 1 0 1 2 2 0 1 2 0 3 0

The frequency table of simulated data from Geometric distribution is as follow:

# frequency table of simulated data
table(x_sim_2)
x_sim_2
0  1  2  3  4  5  6  7
43 20 13  8  6  7  2  1 
plot(table(x_sim_2),xlab="x",ylab="frequency",
lwd=10,col="pink",
main="Simulated data from Geo(0.45) dist")

For the simulation purpose to reproduce same set of random numbers, one can use set.seed() function.

# set seed for reproducibility
set.seed(1457)
# Simulate 100 values From Geometric dist
x_sim_3 <- rgeom(n,prob)
# print values at console
x_sim_3
  [1] 1 0 0 0 0 0 0 2 3 5 4 3 1 5 0 0 1 1 2 1 4 3 0 0 0 1 4 0 0 4 2 0 0 0 1 1 6
[38] 0 0 4 1 1 1 0 0 1 2 2 5 0 0 1 1 2 5 2 2 0 2 3 1 1 0 0 0 0 0 0 4 0 1 1 0 1
[75] 1 0 1 4 0 0 0 0 0 0 0 1 0 3 5 2 1 2 1 4 5 1 6 0 0 2

The frequency table of x_sim_3 is as follows:

table(x_sim_3)
x_sim_3
0  1  2  3  4  5  6
42 25 12  5  8  6  2 
plot(table(x_sim_3),xlab="x",ylab="frequency",
lwd=10,col="purple",
main="Simulated data from Geo(0.45) dist")
set.seed(1457)
# Simulate 100 values From Geometric dist
x_sim_4 <- rgeom(n,prob)
# print values at console
x_sim_4
  [1] 1 0 0 0 0 0 0 2 3 5 4 3 1 5 0 0 1 1 2 1 4 3 0 0 0 1 4 0 0 4 2 0 0 0 1 1 6
[38] 0 0 4 1 1 1 0 0 1 2 2 5 0 0 1 1 2 5 2 2 0 2 3 1 1 0 0 0 0 0 0 4 0 1 1 0 1
[75] 1 0 1 4 0 0 0 0 0 0 0 1 0 3 5 2 1 2 1 4 5 1 6 0 0 2

The frequency table of x_sim_4 is as follows:

table(x_sim_4)
x_sim_4
0  1  2  3  4  5  6
42 25 12  5  8  6  2 
plot(table(x_sim_4),xlab="x",ylab="frequency",
lwd=10,col="purple",
main="Simulated data from Geo(0.45) dist")

Since we have used set.seed(1457) function for both the simulation, the x_sim_3 and x_sim_4 are same.

To learn more about other discrete and continuous probability distributions using R, go through the following tutorials:

Discrete Distributions Using R

Continuous Distributions Using R

## Endnote

In this tutorial, you learned about how to compute the probabilities, cumulative probabilities and quantiles of Geometric distribution in R programming. You also learned about how to simulate a Geometric distribution using R programming.