# Negative Binomial distribution probabilities using R

## Negative Binomial distribution probabilities using R

In this tutorial, you will learn about how to use dnbinom(), pnbinom(), qnbinom() and rnbinom() functions in R programming language to compute the individual probabilities, cumulative probabilities, quantiles and to generate random sample for Negative Binomial distribution.

Before we discuss R functions for Negative Binomial distribution, let us see what is Negative Binomial distribution.

## Negative Binomial Distribution

Negative Binomial distribution distribution helps to describe the probability of occurrence of a number of events in some given time interval or in a specified region. The time interval may be of any length, such as a minutes, a day, a week etc.

Let $X\sim NB(r,p)$. Then the probability distribution of $X$ is

 \begin{aligned} P(X=x)&= \binom{x+r-1}{x} p^{r} q^{x},\\ & \quad x=0,1,2,\ldots; r=1,2,\ldots\\ & \quad 0 < p, q < 1, p+q=1. \end{aligned}

where $r$, the number of successes and $p$, the probability of success in each trial are the parameters of Negative Binomial distribution.

Read more about the theory and results of Negative Binomial distribution here.

## Negative Binomial probabilities using dnbinom() function in R

For discrete probability distribution, density is the probability of getting exactly the value $x$ (i.e., $P(X=x)$).

The syntax to compute the probability at $x$ for Negative Binomial distribution using R is

dnbinom(x,size,prob)

where

• x : the value(s) of the variable and,
• size : target number of successes,
• prob : probability of success in each trial.

The dnbinom() function gives the probability for given value(s) x , size and prob.

## Numerical Problem for Negative Binomial Distribution

To understand the four functions dnbinom(), pnbinom(), qnbinom() and rnbinom(), let us take the following numerical problem.

### Negative Binomial Distribution Example

A large lot of tires contains 5% defectives. 4 tires are to be chosen for a car.

(a) Find the probability that you find 2 defective tires before 4 good ones.
(b) Plot the graph of Negative Binomial probability distribution.
(c) Find the probability that you find at most 1 defective tires before 4 good ones.
(d) Find the probability that you find at least 2 defective tires before 4 good ones.
(e) What is the probability that 1 to 3 (inclusive) defective tires before 4 good ones?
(f) Plot the graph of cumulative Negative Binomial probabilities.
(g) What is the value of $c$, if $P(X\leq c) \geq 0.99$?
(h) Simulate 100 Negative Binomial distributed random variables with $n= 4$ and $prob = 0.95$.

Let $X$ denote the number of defective tires selected before 4 good tires. A large lot of tires contains 5% defectives. The probability that a lot contains good (i.e., non-defective) tire is $p= 1-0.05 = 0.95$.

The random variable $X$ follows a negative binomial distribution with $size=4$ and $prob=0.95$. That is $X\sim NB(4,0.95)$.

### Example 1: How to use dnbinom() function in R?

To find the probability that exactly 2 defective tires before 4 good ones, i.e., $P(X=2)$, we need to use dnbinom() function.

Let $X$ denote the number of breakdowns during a month. is $0.95$. Then $X\sim P(0.95)$.

First let us define the given terms as

# no. of successes
size <- 4
# Probability of success
prob <- 0.95

The probability mass function of $X$ is

 \begin{aligned} P(X=x)&= \binom{x+4-1}{x} (0.95)^{4} (0.05)^{x},\\ &\quad \quad x=0,1,2,\ldots \end{aligned}

For part (a), we need to find the probability $P(X = 2)$.

First I will show you how to calculate this probability using manual calculation, then I will show you how to compute the same probability using dnbinom() function in R.

(a) The probability that exactly 2 defective tires before 4 good ones is

 \begin{aligned} P(X=2)&= \binom{2+3}{2} (0.95)^{4} (0.05)^{2}\\ &= \binom{5}{2} (0.95)^{4} (0.05)^{2}\\ &= 10*(0.8145062)*(0.0025)\\ &= 0.0203627 \end{aligned}

The above probability can be calculated using dnbinom(2,0.95) function in R.

# Compute Negative Binomial probability
result1 <- dnbinom(2,size,prob)
result1
[1] 0.02036266

### Example 2 Visualize Negative Binomial probability distribution

Using dnbinom() function we can compute Negative Binomial distribution probabilities and make a table of it.

# x is the possible values of random variable x
x <- 0:7
## Compute the Negative Binomial probabilities
px<-dnbinom(x,size,prob)
# make a table
nb_table <- cbind(x,px)
# specify the column names
colnames(nb_table) <- c("x", "P(X=x)")
nb_table
     x       P(X=x)
[1,] 0 8.145062e-01
[2,] 1 1.629013e-01
[3,] 2 2.036266e-02
[4,] 3 2.036266e-03
[5,] 4 1.781732e-04
[6,] 5 1.425386e-05
[7,] 6 1.069039e-06
[8,] 7 7.635996e-08

Using kable() function from knitr package, we can create table in LaTeX, HTML, Markdown and reStructured Text.

# to make table
library(knitr)
kable(nb_table)
x P(X=x)
0 0.8145062
1 0.1629013
2 0.0203627
3 0.0020363
4 0.0001782
5 0.0000143
6 0.0000011
7 0.0000001

(b) Visualizing Negative Binomial Distribution with dnbinom() function and plot() function in R:

The probability mass function of Negative Binomial distribution with given size and prob can be visualized using dnbinom() function in plot() function as follows:

## Plot the Negative Binomial probability dist
plot(x,px,type="h",xlim=c(0,8),ylim=c(0,max(px)),
lwd=10, col="blue",ylab="P(X=x)")
title("PMF of Negative Binomial (size = 4, prob= 0.95)")

## Negative Binomial cumulative probability using pnbinom() function in R

The syntax to compute the cumulative probability distribution function (CDF) for Negative Binomial distribution using R is

pnbinom(q,size, prob)

where

• q : the value(s) of the variable,
• size : target number of successes,
• prob : probability of success in each trial.

This function is very useful for calculating the cumulative Negative Binomial probabilities for given value(s) of q (value of the variable x), size and prob.

### Example 3: How to use pnbinom() function in R?

In the above example, for part (c), we need to find the probability $P(X\leq 1)$.

First I will show you how to calculate this probability using manual calculation, then I will show you how to compute the same probability using pnbinom() and dnbinom() function in R.

(c) The probability that at most 1 breakdown during next month

 \begin{aligned} P(X\leq 1)&=\sum_{x=0}^{1}P(X=x)\\ &= P(X=0)+P(X=1)\\ &= \binom{0+3}{0} (0.95)^{4} (0.05)^{0}+\binom{1+3}{1} (0.95)^{4} (0.05)^{1}\\ &= 1*(0.95)^4*0.05^0)+4*(0.95)^4 (0.05)^1\\ &= 0.8145062+0.1629013\\ &= 0.9774075 \end{aligned}

## Compute cumulative Negative Binomial probability
result2 <- pnbinom(1,size,prob)
result2
[1] 0.9774075

Above probability can also be calculated using dnbinom() function and the sum() function as follows:

sum(dnbinom(0:1,size,prob))
[1] 0.9774075

### Example 4: How to use pnbinom() function in R?

In the above example, for part (d), we need to find the probability $P(X \geq 2)$.

Numerically the probability that at least 2 defective tires before 4 good ones can be calculated as

 \begin{aligned} P(X \geq 2) & =1-P(X\leq 1)\\ &=1-\sum_{x=0}^{1} P(X=x)\\ & = 1- (P(X=0)+P(X=1))\\ &= 1- \big(0.8145062+0.1629013)\\ & = 0.0225925\\ \end{aligned}

To calculate the probability that a random variable $X$ is greater than a given number you can use the option lower.tail=FALSE in pnbinom() function.

Above probability can be calculated easily using pnbinom() function with argument lower.tail=FALSE as

$P(X \geq 2) =$ pnbinom(1,prob,lower.tail=FALSE)

or by using complementary event as

$P(X \geq 2) = 1- P(X\leq 1)$= 1- pnbinom(1,prob)

# compute cumulative Negative Binomial probabilities
# with lower.tail False
pnbinom(1,size,prob,lower.tail=FALSE)
[1] 0.0225925
1-pnbinom(1,size,prob)
[1] 0.0225925

### Example 5: How to use pnbinom() function in R?

One can also use pnbinom() function to calculate the probability that the random variable $X$ is between two values.

(e) The probability that the sample contains 1 to 3 (inclusive) defective tires before 4 good ones is

 \begin{aligned} P(1 \leq X \leq 3) &= P(X=1)+P(X=2)+P(X=3)\\ &=\binom{1+3}{1} (0.95)^{4} (0.05)^{1}+\binom{2+3}{2} (0.95)^{4} (0.05)^{2}\\ &\quad +\binom{3+3}{3} (0.95)^{4} (0.05)^{3}\\ &= 0.1629013+0.0203627+0.0020363\\ &= 0.1853002 \end{aligned}

Above event can also be written as

 \begin{aligned} P(1 \leq X \leq 3) &= P(X\leq 3) -P(X\leq 0)\\ &= 0.9998064 - 0.8145062\\ &= 0.1853002 \end{aligned}

The above probability can be calculated using pnbinom() function as follows:

result3 <- pnbinom(3,size,prob)-pnbinom(0,size,prob)
result3
[1] 0.1853002

The above probability can also be calculated using dnbinom() function along with sum() function.

result4 <- sum(dnbinom(1:3,size,prob))
result4
[1] 0.1853002

The first command compute the Negative Binomial probability for $x=1$, $x=2$ and $x=3$. Then add all the probabilities using sum() function and store the result in result4.

### Example 6: Visualize the cumulative Negative Binomial probability distribution

# the value of x
x <- 0:7
## Compute the Negative Binomial probabilities
px <- dnbinom(x,size,prob)
## Compute the cumulative Negative Binomial probabilities
Fx <- pnbinom(x,size,prob)
## make a table
nb_table2 <- cbind(x,px,Fx)
## assign column names
colnames(nb_table2) <- c("x", "P(X=x)","P(X<=x)")
# display result
nb_table2
     x       P(X=x)   P(X<=x)
[1,] 0 8.145062e-01 0.8145062
[2,] 1 1.629013e-01 0.9774075
[3,] 2 2.036266e-02 0.9977702
[4,] 3 2.036266e-03 0.9998064
[5,] 4 1.781732e-04 0.9999846
[6,] 5 1.425386e-05 0.9999988
[7,] 6 1.069039e-06 0.9999999
[8,] 7 7.635996e-08 1.0000000
kable(nb_table2)
x P(X=x) P(X<=x)
0 0.8145062 0.8145062
1 0.1629013 0.9774075
2 0.0203627 0.9977702
3 0.0020363 0.9998064
4 0.0001782 0.9999846
5 0.0000143 0.9999988
6 0.0000011 0.9999999
7 0.0000001 1.0000000

The cumulative probability distribution of Negative Binomial distribution with given x, size and prob can be visualized using plot() function with argument type="s" (step function) as follows:

# Plot the cumulative Negative Binomial dist
plot(x,Fx,type="s",lwd=2,col="blue",
ylab=expression(P(X<=x)),
main="Distribution Function of NB(size= 4,prob = 0.95)")

## Negative Binomial Distribution Quantiles using qnbinom() in R

The syntax to compute the quantiles of Negative Binomial distribution using R is

qnbinom(p,size,prob)

where

• p : the value(s) of the probabilities,
• size : target number of successes,
• prob : probability of success in each trial.

The function qnbinom(p,size,prob) gives $100*p^{th}$ quantile of Negative Binomial distribution for given value of p, size and prob.

The $p^{th}$ quantile is the smallest value of Negative Binomial random variable $X$ such that $P(X\leq x) \geq p$.

It is the inverse of pnbinom() function. That is, inverse cumulative probability distribution function for Negative Binomial distribution.

### Example 7: How to use qnbinom() function in R?

In part (g), we need to find the value of $c$ such a that $P(X\leq c) \geq 0.99$. That is we need to find the $99^{th}$ quantile of given Negative Binomial distribution.

size <- 4
prob <- 0.95
# compute the quantile for Negative Binomial dist
qnbinom(0.99,size, prob)
[1] 2

From the above table of Negative Binomial probabilities and cumulative probabilities, it is clear that $99^{th}$ percentile is 2.

### Visualize the quantiles of Negative Binomial Distribution

The quantiles of Negative Binomial distribution with given p, size and prob can be visualized using plot() function as follows:

p <- seq(0,1,by=0.02)
qx <- qnbinom(p,size=size,prob=prob)
# Plot the quantiles of Negative Binomial dist
plot(p,qx,type="s",lwd=2,col="darkred",
ylab="quantiles",
main="Quantiles of NB(size=4,prob=0.95)")

## Simulating Negative Binomial random variable using rnbinom() function in R

The general R function to generate random numbers from Negative Binomial distribution is

rnbinom(n,size,prob)

where,

• n : the sample size,
• size : target number of successes,
• prob : probability of succcess in each trial.

The function rnbinom(n,size,prob) generates n random numbers from Negative Binomial distribution with given size and prob.

### Example 8: How to use rnbinom() function in R?

In part (h), we need to generate 1000 random numbers from Negative Binomial distribution with given $size = 4$ and $prob=0.95$.

We can use rnbinom() function to generate random numbers from Negative Binomial distribution.

## initialize sample size to generate
n <- 1000
# Simulate 1000 values From Negative Binomial dist
x_sim <- rnbinom(n,size,prob)

To get the frequency table of simulated negative binomial random variables, we can use table() function in R.

## Print the frequency table
table(x_sim)
x_sim
0   1   2   3   4
799 174  24   2   1 
## Plot the simulated data
plot(table(x_sim),xlab="x",ylab="frequency",
lwd=10,col="red",
main="Simulated data from NB(4,0.95) dist")

If you use same function again, R will generate another set of random numbers from $NB(4,0.95)$.

# Simulate 100 values From Negative Binomial dist
x_sim_2 <- rnbinom(n,size,prob)

The frequency table of simulated data from Negative Binomial distribution is as follow:

## Print the frequency table
table(x_sim_2)
x_sim_2
0   1   2   3
816 162  20   2 
## Plot the simulated data
plot(table(x_sim_2),xlab="x",ylab="frequency",
lwd=10,col="red",
main="Simulated data from NB(4,0.95) dist")

For the simulation purpose to reproduce same set of random numbers, one can use set.seed() function.

# set seed for reproducibility
set.seed(1457)
# Simulate 100 values From Negative Binomial dist
x_sim_3 <- rnbinom(n,size,prob)

The frequency table of x_sim_3 is as follows:

## Print the frequency table
table(x_sim_3)
x_sim_3
0   1   2   3   4
805 177  15   2   1 
plot(table(x_sim_3),xlab="x",ylab="frequency",
lwd=10,col="magenta",
main="Simulated data from NB(4,0.95) dist")
set.seed(1457)
# Simulate 1000 values From Negative Binomial dist
x_sim_4 <- rnbinom(n,size,prob)

The frequency table of x_sim_4 is as follows:

## Print the frequency table
table(x_sim_4)
x_sim_4
0   1   2   3   4
805 177  15   2   1 
plot(table(x_sim_4),xlab="x",ylab="frequency",
lwd=10,col="magenta",
main="Simulated data from NB(4,0.95) dist")

Since we have used set.seed(1457) function for both the simulation, the x_sim_3 and x_sim_4 are same.

To learn more about other discrete and continuous probability distributions using R, go through the following tutorials:

Discrete Distributions Using R

Continuous Distributions Using R

## Endnote

In this tutorial, you learned about how to compute the probabilities, cumulative probabilities and quantiles of Negative Binomial distribution in R programming. You also learned about how to simulate a Negative Binomial distribution using R programming.