## Hypergeometric distribution probabilities using R

In this tutorial, you will learn about how to use `dhyper()`

, `phyper()`

, `qhyper()`

and `rhyper()`

functions in R programming language to compute the individual probabilities, cumulative probabilities, quantiles and to generate random sample for Hypergeometric distribution.

Before we discuss R functions for Hypergeometric distribution, let us see what is Hypergeometric distribution.

## Hypergeometric Distribution

A hypergeometric experiment is an experiment which satisfies each of the following conditions:

- The population or set to be sampled consists of $m+n$ objects, or elements (a finite population).
- Each object can be characterized as a "success" or "failure", and there are $m$ number of successes in the population and $n$ failures in the population.
- A sample of $k$ individuals is drawn in such a way that each subset of size $k$ is equally likely to be chosen.

Let $X\sim H(m,n,k)$. Then the probability distribution of $X$ is

` $$ \begin{aligned} P(X=x) &= \frac{\binom{m}{x}\binom{n}{k-x}}{\binom{m+n}{k}},\\ & \quad x=0,1,2,\cdots,k. \end{aligned} $$ `

Read more about the theory and results of Hypergeometric distribution here.

## Hypergeometric probabilities using `dhyper()`

function in R

For discrete probability distribution, density is the probability of getting exactly the value $x$ (i.e., $P(X=x)$).

The syntax to compute the probability at $x$ for Hypergeometric distribution using R is

`dhyper(x,m,n,k)`

where

`x`

: the value(s) of the variable,`m`

: the number of success in the population,`n`

: the number of failure in the population,`k`

: the sample size selected from the population.

The `dhyper()`

function gives the probability for given value(s) `x`

, `m`

, `n`

and `k`

.

## Numerical Problem for Hypergeometric Distribution

To understand the four functions `dhyper()`

, `phyper()`

, `qhyper()`

and `rhyper()`

, let us take the following numerical problem.

### Hypergeometric Distribution Example

A company produces and ships 16 personal computers knowing that 5 of them have defective wiring. The company that purchased the computers is going to thoroughly test four of the computers. The purchasing company can detect the defective wiring.

(a) Find the probability that no defective computers.

(b) Plot the graph of Hypergeometric probability distribution.

(c) What is the probability that the purchasing company will find at most one defective computers?

(d) What is the probability that the purchasing company will find at least 2 defective computers?

(e) What is the probability that the purchasing company will find 2 to 4 (inclusive) defective computers?

(f) Plot the graph of cumulative Hypergeometric probabilities.

(g) What is the value of $c$, if $P(X\leq c) \geq 0.90$?

(h) Simulate 100 Hypergeometric distributed random variables for the given problem.

### Example 1: How to use `dhyper()`

function in R?

To find the probability that exactly four female students are selected, we need to use `dhyper()`

function.

Let $X$ denote defective PC's in the sample. Consider defective as a success. Then the random variable $X$ has hypergeometric distribution with Population Size $m+n = 16$, number of successes in the population $m = 5$ (hence $n=11$) and the sample size $k = 4$, i.e., $X\sim H(m = 5, n= 11, k = 4)$.

First let us define the given terms as

```
## number of success
m <- 5
## number of failures
n <- 11
## sample size
k <- 4
```

The probability mass function of $X$ is

` $$ \begin{aligned} P(X=x) &= \frac{\binom{5}{x}\binom{11}{4-x}}{\binom{16}{4}},\\ & \quad x=0,1,2,\cdots,4 \end{aligned} $$ `

For part (a), we need to find the probability $P(X = 0)$.

First I will show you how to calculate this probability using manual calculation, then I will show you how to compute the same probability using `dhyper()`

function in R.

(a) The probability that no defective computer is

` $$ \begin{aligned} P(X = 0) & =\frac{\binom{5}{0}\binom{11}{4-0}}{\binom{16}{4}} \\ & = 0.1813187\\ \end{aligned} $$ `

The above probability can be calculated using `dhyper(0,5,11,4)`

function in R.

```
# Compute Hypergeometric probability
result1 <- dhyper(0,m,n,k)
result1
```

`[1] 0.1813187`

### Example 2 Visualize Hypergeometric probability distribution

Using `dhyper()`

function we can compute Hypergeometric distribution probabilities and make a table of it.

```
# assign values 0 to 4 to x
x <- 0:4
## Compute the Hypergeometric probabilities
px<-dhyper(x,m,n,k)
# make a table
H_table <- cbind(x,px)
# specify the column names
colnames(H_table) <- c("x", "P(X=x)")
H_table
```

```
x P(X=x)
[1,] 0 0.181318681
[2,] 1 0.453296703
[3,] 2 0.302197802
[4,] 3 0.060439560
[5,] 4 0.002747253
```

Using `kable()`

function from knitr package, we can create table in LaTeX, HTML, Markdown and reStructured Text.

```
# to make table
library(knitr)
kable(H_table)
```

x | P(X=x) |
---|---|

0 | 0.1813187 |

1 | 0.4532967 |

2 | 0.3021978 |

3 | 0.0604396 |

4 | 0.0027473 |

(b) Visualizing Hypergeometric Distribution with `dhyper()`

function and `plot()`

function in R:

The probability mass function of Hypergeometric distribution with given `m`

, `n`

, `k`

can be visualized using `dhyper()`

function in `plot()`

function as follows:

```
# assign values 0 to 4 to x
x <- 0:4
## Plot the Hypergeometric probability dist
plot(x,px,type="h",xlim=c(0,5),ylim=c(0,max(px)),
lwd=10, col="darkred",ylab="P(X=x)")
title("PMF of Hypergeometric (m,n,k)")
```

## Hypergeometric cumulative probability using `phyper()`

function in R

The syntax to compute the cumulative probability distribution function (CDF) for Hypergeometric distribution using R is

`phyper(q,m,n,k)`

where

`q`

: the value(s) of the variable,`m`

: the number of success in the population,`n`

: the number of failure in the population,`k`

: the sample size selected from the population.

This function is very useful for calculating the cumulative Hypergeometric probabilities for given value(s) of `q`

(value of the variable `x`

), `m`

, `n`

, and `k`

.

### Example 3: How to use `phyper()`

function in R?

In the above example, for part (c), we need to find the probability $P(X\leq 1)$.

First I will show you how to calculate this probability using manual calculation, then I will show you how to compute the same probability using `phyper()`

and `dhyper()`

function in R.

(c) The probability that at most 1 defective computer is

` $$ \begin{aligned} P(X\leq 1) &= P(X=0)+ P(X=1)\\ &= \frac{\binom{5}{0}\binom{11}{4-0}}{\binom{16}{4}}+\frac{\binom{5}{1}\binom{11}{4-1}}{\binom{16}{4}}\\ &= 0.1813187+0.4532967\\ &= 0.6346154 \end{aligned} $$ `

```
## Compute cumulative Hypergeometric probability
result2 <- phyper(1,m,n,k)
result2
```

`[1] 0.6346154`

Above probability can also be calculated using `dhyper()`

function and the `sum()`

function as follows:

`sum(dhyper(0:1,m,n,k))`

`[1] 0.6346154`

### Example 4: How to use `phyper()`

function in R?

In the above example, for part (d), we need to find the probability $P(X \geq 2)$.

Numerically the probability that at least 2 defective computers can be calculated as

` $$ \begin{aligned} P(X \geq 2) & =1-P(X\leq 1)\\ & = 1- (P(X=0)+P(X=1))\\ &= 1- \big(0.1813187+0.4532967\big)\\ & = 0.3653846\\ \end{aligned} $$ `

To calculate the probability that a random variable $X$ is greater than a given number you can use the option `lower.tail=FALSE`

in `phyper()`

function.

Above probability can be calculated easily using `phyper()`

function with argument `lower.tail=FALSE`

as

$P(X \geq 2) =$ `phyper(1,m,n,k,lower.tail=FALSE)`

or by using complementary event as

$P(X \geq 2) = 1- P(X\leq 1)$= 1- `phyper(1,m,n,k)`

```
# compute cumulative Hypergeometric probabilities
# with lower.tail False
phyper(1,m,n,k,lower.tail=FALSE)
```

`[1] 0.3653846`

`1-phyper(1,m,n,k)`

`[1] 0.3653846`

### Example 5: How to use `phyper()`

function in R?

One can also use `phyper()`

function to calculate the probability that the random variable $X$ is between two values.

(e) The probability that between 2 to 4 (inclusive) computers are defective is

` $$ \begin{aligned} P(2 \leq X \leq 4) &= P(X=2)+P(X=3)+P(X=4)\\ &=\frac{\binom{5}{2}\binom{11}{4-2}}{\binom{16}{4}}+\frac{\binom{5}{3}\binom{11}{4-3}}{\binom{16}{4}}\\ &\quad +\frac{\binom{5}{4}\binom{11}{4-4}}{\binom{16}{4}}\\ &= 0.3021978+0.0604396+0.0027473\\ &= 0.3653846 \end{aligned} $$ `

Above event can also be written as

` $$ \begin{aligned} P(2 \leq X \leq 4) &= P(X\leq 4) -P(X\leq 1)\\ &= 1 - 0.6346154\\ &=0.3653846 \end{aligned} $$ `

The above probability can be calculated using `phyper()`

function as follows:

```
result3 <- phyper(4,m,n,k)-phyper(1,m,n,k)
result3
```

`[1] 0.3653846`

The above probability can also be calculated using `dhyper()`

function along with `sum()`

function.

```
result4 <- sum(dhyper(2:4,m,n,k))
result4
```

`[1] 0.3653846`

The first command compute the Hypergeometric probability for $x=2$, $x=3$ and $x=4$. Then add all the probabilities using `sum()`

function and store the result in `result4`

.

### Example 6: Visualize the cumulative Hypergeometric probability distribution

```
# assign values 0 to 4 to x
x <- 0:4
## Compute the Hypergeometric probabilities
px <- dhyper(x,m,n,k)
## Compute the cumulative Hypergeometric probabilities
Fx <- phyper(x,m,n,k)
## make a table
H_table2 <- cbind(x,px,Fx)
## assign column names
colnames(H_table2) <- c("x", "P(X=x)","P(X<=x)")
# display result
H_table2
```

```
x P(X=x) P(X<=x)
[1,] 0 0.181318681 0.1813187
[2,] 1 0.453296703 0.6346154
[3,] 2 0.302197802 0.9368132
[4,] 3 0.060439560 0.9972527
[5,] 4 0.002747253 1.0000000
```

`kable(H_table2)`

x | P(X=x) | P(X<=x) |
---|---|---|

0 | 0.1813187 | 0.1813187 |

1 | 0.4532967 | 0.6346154 |

2 | 0.3021978 | 0.9368132 |

3 | 0.0604396 | 0.9972527 |

4 | 0.0027473 | 1.0000000 |

The cumulative probability distribution of Hypergeometric distribution with given `m`

, `n`

and `k`

can be visualized using `plot()`

function with argument `type="s"`

(step function) as follows:

```
# define values of X
x <- 0:4
# Plot the cumulative Hypergeometric dist
plot(x,Fx,type="s",lwd=2,col="darkred",
ylab=expression(P(X<=x)),
main="Distribution Function of H(m,n,k)")
```

## Hypergeometric Distribution Quantiles using `qhyper()`

in R

The syntax to compute the quantiles of Hypergeometric distribution using R is

`qhyper(p,m,n,k)`

where

`p`

: the value(s) of the probabilities,`m`

: the number of success in the population,`n`

: the number of failure in the population,`k`

: the sample size selected from the population.

The function `qhyper(p,m,n,k)`

gives $100*p^{th}$ quantile of Hypergeometric distribution for given value of `p`

, `m`

, `n`

, `k`

.

The $p^{th}$ quantile is the smallest value of Hypergeometric random variable $X$ such that $P(X\leq x) \geq p$.

It is the inverse of `phyper()`

function. That is, inverse cumulative probability distribution function for Hypergeometric distribution.

### Example 7: How to use `qhyper()`

function in R?

In part (g), we need to find the value of $c$ such a that $P(X\leq c) \geq 0.90$. That is we need to find the $60^{th}$ quantile of given Hypergeometric distribution.

```
# compute the quantile for Hypergeometric dist
qhyper(0.90,m,n,k)
```

`[1] 2`

From the above table of Hypergeometric probabilities and cumulative probabilities, it is clear that $90^{th}$ percentile is 2.

### Visualize the quantiles of Hypergeometric Distribution

The quantiles of Hypergeometric distribution with given `p`

, `m`

, `n`

and `k`

can be visualized using `plot()`

function as follows:

```
p <- seq(0,1,by=0.02)
qx <- qhyper(p,m,n,k)
# Plot the quantiles of Hypergeometric dist
plot(p,qx,type="s",lwd=2,col="darkred",
ylab="quantiles",
main="Quantiles of H(m=5,n=11,k=4)")
```

## Simulating Hypergeometric random variable using `rhyper()`

function in R

The general R function to generate random numbers from Hypergeometric distribution is `rhyper(nn,m,n,k)`

,

where,

`nn`

is the number of observations,`m`

: the number of success in the population,`n`

: the number of failure in the population,`k`

: the sample size selected from the population.

The function `rhyper(nn,,n,k)`

generates `nn`

random numbers from Hypergeometric distribution with `m`

, `n`

, `k`

.

### Example 8: How to use `rhyper()`

function in R?

In part (h), we need to generate 100 random numbers from Hypergeometric distribution with $m = 5$, $n = 11$ and $k= 4$.

We can use `rhyper()`

function to generate random numbers from Hypergeometric distribution.

```
## initialize number of observations to generate
nn <- 100
# Simulate 100 values From Hypergeometric dist
x_sim <- rhyper(nn,m,n,k)
# print values at console
x_sim
```

```
[1] 1 2 1 2 3 0 1 2 1 1 3 1 2 1 0 2 1 0 1 3 2 2 2 3 2 2 1 1 1 0 3 2 2 2 0 1 2
[38] 1 1 1 0 1 1 1 0 0 1 1 1 2 0 1 2 0 1 1 0 2 2 1 2 0 1 1 2 1 2 2 2 1 2 1 2 0
[75] 1 1 1 1 1 0 1 2 1 2 0 1 3 2 2 0 0 2 1 2 1 1 2 0 1 1
```

To get the frequency table of simulated hypergeometric random variables, we can use `table()`

function in R.

```
## Print the frequency table
table(x_sim)
```

```
x_sim
0 1 2 3
18 44 32 6
```

```
## Plot the simulated data
plot(table(x_sim),xlab="x",ylab="frequency",
lwd=10,col="red",
main="Simulated data from H(5,11,4) dist")
```

If you use same function again, R will generate another set of random numbers from $H(m=5, n = 11, k =4)$.

```
# Simulate 100 values From Hypergeometric dist
x_sim_2 <- rhyper(nn,m,n,k)
# print values at console
x_sim_2
```

```
[1] 1 1 1 3 1 2 2 1 1 0 2 1 0 3 2 0 1 3 1 1 2 1 1 1 1 3 0 0 0 2 1 2 2 2 1 2 2
[38] 2 3 1 1 1 0 1 2 1 1 0 1 2 2 1 1 1 0 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 2 2
[75] 2 1 1 1 2 1 2 1 2 1 1 1 1 1 2 2 1 1 3 1 3 1 1 2 0 1
```

The frequency table of simulated data from Hypergeometric distribution is as follow:

```
## Print the frequency table
table(x_sim_2)
```

```
x_sim_2
0 1 2 3
10 57 26 7
```

```
## Plot the simulated data
plot(table(x_sim_2),xlab="x",ylab="frequency",
lwd=10,col="red",
main="Simulated data from H(5,11,4) dist")
```

For the simulation purpose to reproduce same set of random numbers, one can use `set.seed()`

function.

```
# set seed for reproducibility
set.seed(1457)
# Simulate 100 values From Hypergeometric dist
x_sim_3 <- rhyper(nn,m,n,k)
# print values at console
x_sim_3
```

```
[1] 2 1 1 1 1 0 2 1 2 1 2 1 1 0 1 1 1 1 0 2 0 1 1 2 1 1 2 2 1 1 2 0 0 1 0 2 2
[38] 1 1 1 2 1 1 0 1 1 1 2 1 1 1 1 3 1 3 0 1 0 2 2 1 1 1 1 1 2 3 2 2 1 2 0 1 1
[75] 1 2 1 3 1 0 1 0 1 0 2 1 2 2 0 2 1 1 2 1 0 1 1 2 1 1
```

The frequency table of `x_sim_3`

is as follows:

```
## Print the frequency table
table(x_sim_3)
```

```
x_sim_3
0 1 2 3
16 54 26 4
```

```
## Plot the simulated data
plot(table(x_sim_3),xlab="x",ylab="frequency",
lwd=10,col="darkred",
main="Simulated data from H(5,11,4) dist")
```

```
set.seed(1457)
# Simulate 100 values From Hypergeometric dist
x_sim_4 <- rhyper(nn,m,n,k)
# print values at console
x_sim_4
```

```
[1] 2 1 1 1 1 0 2 1 2 1 2 1 1 0 1 1 1 1 0 2 0 1 1 2 1 1 2 2 1 1 2 0 0 1 0 2 2
[38] 1 1 1 2 1 1 0 1 1 1 2 1 1 1 1 3 1 3 0 1 0 2 2 1 1 1 1 1 2 3 2 2 1 2 0 1 1
[75] 1 2 1 3 1 0 1 0 1 0 2 1 2 2 0 2 1 1 2 1 0 1 1 2 1 1
```

The frequency table of `x_sim_4`

is as follows:

```
## Print the frequency table
table(x_sim_4)
```

```
x_sim_4
0 1 2 3
16 54 26 4
```

```
## Plot the simulated data
plot(table(x_sim_4),xlab="x",ylab="frequency",
lwd=10,col="darkred",
main="Simulated data from H(5,11,4) dist")
```

Since we have used `set.seed(1457)`

function for both the simulation, the `x_sim_3`

and `x_sim_4`

are same.

To learn more about other discrete and continuous probability distributions using R, go through the following tutorials:

**Discrete Distributions Using R**

Binomial distribution in R

Poisson distribution in R

Geometric distribution in R

Negative Binomial distribution in R

**Continuous Distributions Using R**

Uniform distribution in R

Exponential distribution in R

Normal distribution in R

Log-Normal distribution in R

Beta distribution in R

Gamma distribution in R

Cauchy distribution in R

Laplace distribution in R

Logistic distribution in R

Weibull distribution in R

## Endnote

In this tutorial, you learned about how to compute the probabilities, cumulative probabilities and quantiles of Hypergeometric distribution in R programming. You also learned about how to simulate a Hypergeometric distribution using R programming.

To learn more about R code for discrete and continuous probability distributions, please refer to the following tutorials:

Probability Distributions using R

Let me know in the comments below, if you have any questions on Hypergeometric Distribution using R and your thought on this article.