# Chi-square test of goodness of fit with examples

## Chi-square test of goodness of fit

Chi-square test of goodness of fit is a non-parametric test.

One of the principle use of $\chi^2$-distribution is to test how well an observed distribution fits to a theoretical one. That is, the chi-square test of goodness of fit enables us to compare the distribution of classes of observations with an expected distribution.

In the test of hypothesis it is usually assumed that the random variable follows a particular distribution like Binomial, Poisson, Normal etc.

To test whether the assumption about the distribution is true, Chi-square test of goodness of fit is performed and the decision about whether the data follows a particular distribution will be taken.

## Assumptions

• Observations are independent
• Expected frequencies must be at least 5.

The chi-square goodness of fit test is not applicable if the expected frequencies are too small ( < 5).

## Step by Step procedure for Chi-square test of goodness of fit

The step by step procedure for chi-square goodness of fit test is as follows:

#### Step 1 : Setup the null and alternative hypothesis

The null hypothesis for test of goodness of fit is

$H_0:$ There is no significant difference between the observed and expected values. (That is, data fits well to the assumed distribution (like Binomial, Poisson, Normal, etc.))

The alternative hypothesis is

$H_1:$ There is a significant difference between the observed and expected values. (That is, data does not fit well to the assumed distribution)

#### Step 2 : Define the test statistic

The test statistic for testing the above hypothesis testing problem is

 $$\begin{equation*} \chi^2= \sum \frac{(O_{i} -E_{i})^2}{E_{i}} \sim \chi^2_{(n-k-1)}\\ \end{equation*}$$

where

• $O_i=f_i$ observed frequency for $i^{th}$ category or $i^{th}$ value of $x$,
• $E_i$ expected frequencies according to the assumed distribution,
• $n$ is the number of categories after pooling,
• $k$ is the number of parameters estimated.

#### Step 3 : Sepcify the level of significance.

Specify the level of significance $\alpha$ ($0 < \alpha < 1$)

#### Step 4 : Calculate the test statistic

Estimate the parameter(s) (if any).

• Assuming that the null hypothesis is true, estimate the parameter(s) of the distribution.

• Using the estimated parameter(s) and the assumed distribution determine the probabilities, say, $P(X=x)$ for each value of $x$.

• Compute the expected frequencies on multiplying the probabilities by total number of observations ($N$). That is $E_i=N*P(X=x_i)$.

• If any of the expected frequency is less than five then pooled the frequency to adjust the categories or values of $x$. After pooling the frequency we will get the number of pooled categories $n$.

Compute the test statistic

 $$\begin{equation*} \chi^2_{obs}= \sum \frac{(O_{i} -E_{i})^2}{E_{i}} \end{equation*}$$

#### Step 5 Critical value of Chi-square

The degrees of freedom for the chi-square test of goodness of fit is $n-k-1$.

The table value of $\chi^2$ for $n-k-1$ degrees of freedom and at $\alpha$ level of significance is $\chi^2_t=\chi^2_{n-k-1,\alpha}$.

If $\chi^2_{obs}\leq \chi^2_t$, then fail to reject the null hypothesis $H_0$ at $\alpha$ level of significance, i.e., the data fits well to the assumed distribution, other wise reject $H_0$ at $\alpha$ level of significance.

OR

#### Step 6 Decision (p-value approach)

The p-value of the test is

 $$p = P(\chi^2_{n-k-1}\geq\chi^2_{obs})$$

If $p$-value of the test is less than $\alpha$, then reject the null hypothesis $H_0$ at $\alpha$ level of significance, otherwise fail to reject $H_0$ at $\alpha$ level of significance.

## Chi-square test of goodness of fit Example 1

To test whether a die is fair, 60 rolls were made, and the corresponding outcomes were as follows:

Face Value 1 2 3 4 5 6
Observed freq. 5 7 17 16 8 7

#### Solution

The observed data is

Face Value Obs. Freq.$(O_i)$ $p_i$ Expe.Freq.$(E_i)$
1 5 0.1666667 10
2 7 0.1666667 10
3 17 0.1666667 10
4 16 0.1666667 10
5 8 0.1666667 10
6 7 0.1666667 10
Total 60 1.0000000 60

#### Step 1 Setup the Null and alternative hypothesis

The null and alternative hypothesis are as follows:
$H_0:p_{1}=p_{2} =\cdots = p_{6} =\frac{1}{6}$

$H_1:$ At least one of the proportion is different from $\frac{1}{6}$.

#### Step 2 Test statistic

The test statistic for testing above hypothesis is

 $$\begin{equation*} \chi^2= \sum \frac{(O_{i} -E_{i})^2}{E_{i}} \sim \chi^2_{n-k-1}\\ \end{equation*}$$

#### Step 3 Specify the level of significance

The level of significance be $\alpha = 0.05$.

#### Step 4 Calculate the Test Statistic

As there is no parameter to be estimated, $k=0$.

Under the null hypothesis $p_i=\frac{1}{6}$ for all $i$.

The expected frequencies can be calculated as

 \begin{aligned} E_{i} &=N*P(X=x_i)\\ &=N*p_i \end{aligned}

For example, $E_{1}$ is given by

 $$\begin{eqnarray*} E_{1} & = &N*p_1\\ &=& 60*0.1667\\ &=&10. \end{eqnarray*}$$

Face Value Obs. Freq.$(O_i)$ $p_i$ Expe.Freq.$(E_i)$ $(O_i-E_i)^2/E_i$
1 5 0.1666667 10 2.5
2 7 0.1666667 10 0.9
3 17 0.1666667 10 4.9
4 16 0.1666667 10 3.6
5 8 0.1666667 10 0.4
6 7 0.1666667 10 0.9
Total 60 1.0000000 60 13.2

The test statistic is

 $$\begin{eqnarray*} \chi^2_{obs}&=& \sum \frac{(O_{i} -E_{i})^2}{E_{i}} \sim \chi^2_{(k-1)}\\ &=&\frac{(5-10)^2}{10}+\frac{(7-10)^2}{10}+ \frac{(17-10)^2}{10}\\ & & +\frac{(16-10)^2}{10}+\frac{(8-10)^2}{10}+ \frac{(7-10)^2}{10}\\ &=& 2.5 + 0.9+ 4.9+ 3.6+ 0.4 + 0.9\\ &=& 13.2. \end{eqnarray*}$$

#### Step 5 Critical value of Chi-square

The degrees of freedom for the chi-square test of goodness of fit is $df=n-k-1=6 - 0 -1 = 5$.

The table value of $\chi^2$ for $n-1$ degrees of freedom and at $\alpha$ level of significance is $\chi^2_t=\chi^2_{n-k-1,\alpha}=\chi^2_{5,0.05}=11.0705$.

The test statistic is $\chi^2_{obs} =13.2$ which falls $inside$ the critical region bounded by the critical value $11.0705$, we $\textit{reject}$ the null hypothesis.

OR

#### Step 6 Decision ($p$-value approach)

The estimate of $p$-value is $P(\chi^2_{5}>13.2) =0.02157$.

## Chi-square test of goodness of fit Example 2

A drop-in auto repair shop staffs the same number of mechanics on every weekday (weekends are not counted here). One of the mechanics thinks this is a bad idea because he suspects the number of customers is not evenly distributed across these days. For a sample of 289 customers, the counts by weekday are given in the table.

Number of Customers by Day (n = 289)

Day Monday Tuesday Wednesday Thursday Friday
Count 51 68 57 67 46

Test the claim that the number of customers is not evenly distributed across the five weekdays. Test this claim at the 0.05 significance level.

#### Solution

The observed data is

Day Obs. Freq.$(O_i)$ Prop. $p_i$
Monday 51 0.2
Tuesday 68 0.2
Wednesday 57 0.2
Thursday 67 0.2
Friday 46 0.2

#### Step 1 Setup the Null and alternative hypothesis

The null and alternative hypothesis are as follows:

$H_0:p_{Mon}=p_{Tue} =p_{Wed} =p_{Thu} = p_{Fri}=1/5$

$H_1:$ The number of customers is not evenly distributed across the five weekdays.

$H_1:$ At least one of the proportion is different from $\frac{1}{5}$.

#### Step 2 Test statistic

The test statistic for testing above hypothesis is

 $$\begin{equation*} \chi^2= \sum \frac{(O_{i} -E_{i})^2}{E_{i}} \sim \chi^2_{n-k-1}\\ \end{equation*}$$

#### Step 3 Specify the level of significance

The level of significance be $\alpha = 0.05$.

#### Step 4 Calculate the Test Statistic

As there is no parameter to be estimated, $k=0$.

The test statistic for testing above hypothesis is

 \begin{aligned} \chi^2& = \sum \frac{(O_{i} -E_{i})^2}{E_{i}} \sim \chi^2_{(k-1)}\\ \end{aligned}

The expected frequencies can be calculated as

 \begin{aligned} E_{i} =N*p_i \end{aligned}

For example, $E_{1}$ is given by

 \begin{aligned} E_{1} & = N*p_1\\ &= 289*0.2\\ &=57.8. \end{aligned}

$E_{2}$ is given by

 \begin{aligned} E_{2} & = N*p_2\\ &= 289*0.2\\ &=57.8. \end{aligned}

$E_{3}$ is given by

 \begin{aligned} E_{3} & = N*p_3\\ &= 289*0.2\\ &=57.8. \end{aligned}

$E_{4}$ is given by

 \begin{aligned} E_{4} & = N*p_4\\ &= 289*0.2\\ &=57.8. \end{aligned}

$E_{5}$ is given by

 \begin{aligned} E_{5} & = N*p_5\\ &= 289*0.2\\ &=57.8. \end{aligned}

Day Obs. Freq.$(O_i)$ Prop. $p_i$ Expe.Freq.$(E_i)$ $(O_i-E_i)^2/E_i$
Monday 51 0.2 57.8 0.8
Tuesday 68 0.2 57.8 1.8
Wednesday 57 0.2 57.8 0.011
Thursday 67 0.2 57.8 1.464
Friday 46 0.2 57.8 2.409

The test statistic is

 \begin{aligned} \chi^2_{obs}&= \sum \frac{(O_{i} -E_{i})^2}{E_{i}} \sim \chi^2_{(k-1)}\\ &=\frac{(51-57.8)^2}{57.8}+\frac{(68-57.8)^2}{57.8}+\frac{(57-57.8)^2}{57.8}\\ &\quad +\frac{(67-57.8)^2}{57.8}+ \frac{(46-57.8)^2}{57.8}\\ &= 0.8 +1.8 +0.011+ 1.464+2.409\\ &= 6.484. \end{aligned}

#### Step 5 Critical value of Chi-square

The degrees of freedom for the chi-square test of goodness of fit is $df=n-k-1=5 - 0 -1 = 4$.

The table value of $\chi^2$ for $n-1$ degrees of freedom and at $\alpha$ level of significance is $\chi^2_t=\chi^2_{n-k-1,\alpha}=\chi^2_{4,0.05}=9.4877$.

The test statistic is $\chi^2_{obs} =6.484$ which falls $outside$ the critical region bounded by the critical value $9.4877$, we $\textit{fail to reject}$ the null hypothesis.

OR

#### Step 6 Decision ($p$-value approach)

The estimate of $p$-value is $P(\chi^2_{4}>6.484) =0.1658$.

We conclude that there is not enough data to support the claim that the number of customers is not evenly distributed across the five weekdays.

## Chi-square test of goodness of fit Example 3

A doctor believes the number of births by season is uniformly distributed. To test this claim you randomly select 2246 births and record the season in which each takes palce. The results are shown below. At $\alpha = 0.01$, can you reject the claim that the distribution is uniform?

Season Spring Summer Winter Fall
Births 564 602 555 525

#### Solution

Assuming the uniform distribution the proportions are $p_{Spring}=p_{Summer}=p_{Winter}=p_{Fall}=1/4 = 0.25$.

The observed data is

Season Obs. Freq.$(O_i)$ Prop.$p_i$
Spring 564 0.25
Summer 602 0.25
Winter 555 0.25
Fall 525 0.25

#### Step 1 Setup the Null and alternative hypothesis

The null and alternative hypothesis are as follows:

$H_0:p_{Spring}=p_{Summer}=p_{Winter}=p_{Fall}=1/4 = 0.25$

(i.e., The number of births by season is uniformly distributed.)

$H_1:$ The number of births by season is not uniformly distributed.

#### Step 2 Test statistic

The test statistic for testing above hypothesis is

 $$\begin{equation*} \chi^2= \sum \frac{(O_{i} -E_{i})^2}{E_{i}} \sim \chi^2_{n-k-1}\\ \end{equation*}$$

#### Step 3 Specify the level of significance

The level of significance be $\alpha = 0.01$.

#### Step 4 Calculate the Test Statistic

As there is no parameter to be estimated, $k=0$.

The test statistic for testing above hypothesis is

 \begin{aligned} \chi^2& = \sum \frac{(O_{i} -E_{i})^2}{E_{i}} \sim \chi^2_{(k-1)}\\ \end{aligned}

The expected frequencies can be calculated as

 \begin{aligned} E_{i} =N*p_i \end{aligned}

For example, $E_{1}$ is given by

 \begin{aligned} E_{1} & = N*p_1\\ &= 2246*0.25\\ &=561.5. \end{aligned}

$E_{2}$ is given by

 \begin{aligned} E_{2} & = N*p_2\\ &= 2246*0.25\\ &=561.5. \end{aligned}

$E_{3}$ is given by

 \begin{aligned} E_{3} & = N*p_3\\ &= 2246*0.25\\ &=561.5. \end{aligned}

$E_{4}$ is given by

 \begin{aligned} E_{4} & = N*p_4\\ &= 2246*0.25\\ &=561.5. \end{aligned}

Season Obs. Freq.$(O_i)$ Prop. $p_i$ Expe.Freq.$(E_i)$ $(O_i-E_i)^2/E_i$
Spring 564 0.25 561.5 0.011
Summer 602 0.25 561.5 2.921
Winter 555 0.25 561.5 0.075
Fall 525 0.25 561.5 2.373

The test statistic is
 \begin{aligned} \chi^2_{obs}&= \sum \frac{(O_{i} -E_{i})^2}{E_{i}} \sim \chi^2_{(k-1)}\\ &=\frac{(564-561.5)^2}{561.5}+\frac{(602-561.5)^2}{561.5}\\ &\quad +\frac{(555-561.5)^2}{561.5}+ \frac{(525-561.5)^2}{561.5}\\ &= 0.011 +2.921 +0.075+ 2.373\\ &= 5.38. \end{aligned}

#### Step 5 Critical value of Chi-square

The degrees of freedom for the chi-square test of goodness of fit is $df=n-k-1=4 - 0 -1 = 3$.

The table value of $\chi^2$ for $n-k-1$ degrees of freedom and at $\alpha$ level of significance is $\chi^2_t=\chi^2_{n-k-1,\alpha}=\chi^2_{3,0.01}=11.3449$.

The test statistic is $\chi^2_{obs} =5.38$ which falls $outside$ the critical region bounded by the critical value $11.3449$, we $\textit{fail to reject}$ the null hypothesis.

OR

#### Step 6 Decision ($p$-value approach)

The estimate of $p$-value is $P(\chi^2_{3}>5.38) =0.14599$.

We conclude that the number of births by season is uniformly distributed at $\alpha = 0.01$ level of significance.

## Chi-square test of goodness of fit Example 4

Three similar coins are tossed 100 times and the number of heads are recorded as follows:

0 32
1 42
2 20
3 6

Fit a binomial distribution to the above data and test its goodness of fit.

#### Step 1 : Setup the null and alternative hypothesis

The null hypothesis for test of goodness of fit is

$H_0:$ The number of heads follows a Binomial distribution.

The alternative hypothesis is

$H_1:$ The number of heads do not follows a Binomial distribution.

#### Step 2 : Define the test statistic

We use the chi-square test of goodness of fit for testing hypothesis testing problem. The chi-square test statistic for testing above hypothesis testing problem is

 $$\begin{equation*} \chi^2= \sum \frac{(O_{i} -E_{i})^2}{E_{i}} \sim \chi^2_{(n-k-1)}\\ \end{equation*}$$

where

• $O_i=f_i$ observed frequency for $i^{th}$ category or $i^{th}$ value of $x$,
• $E_i$ expected frequencies according to the assumed distribution,
• $n$ is the number of categories after pooling,
• $k$ is the number of parameters estimated.

#### Step 3 : Sepcify the level of significance.

The level of significance $\alpha=0.05$.

#### Step 4 : Calculate the test statistic

x Freq.$(f)$ f*x
0 32 0
1 42 42
2 20 40
3 6 18
Total 100 100

First estimate the parameter of Poisson distribution. The population mean of Binomial distribution with number of trials $n$ and probability of success (Head) $p$ is $E(X)=n\times p$. The sample mean of observed data is $\overline{X}=\frac{1}{N}\sum f_ix_i$.

The sample mean is

 \begin{aligned} \overline{X}&=\frac{1}{\sum f_i}\sum f_ix_i\\ &=\frac{100}{100}\\ &=1 \end{aligned}

By the method of moments, the parameter $p$ can be estimated as

 \begin{aligned} \hat{p} &= \frac{\overline{X}}{n}\\ &=\frac{1}{4}\\ &=0.25 \end{aligned}

The probability mass function of Binomial distribution is given by

 \begin{aligned} P(X=x)&= \binom{n}{x}p^xq^{1-x},\\ &\quad x=0,1,2,\cdots, n;\\ &\quad 0 < p < 1, q=1-p \end{aligned}

We have the estimated value of $p$ as $\hat{p}=0.25$.

The expected probabilities for different values of $x$ can be obtained on substituting $n=4$ and $\hat{p}=0.25$ in above probability mass function.

For example, the expected probability for $x=0$ is

 \begin{aligned} P(X=0)&= \binom{4}{0}0.25^{0}0.75^{3-0},\\ &=0.3164 \end{aligned}

Similarly the expected probabilities for other values of $x$ can be calculated using the probability mass function.

The expected frequency for $x=0$ can be calculated as

 \begin{aligned} E_{1} &=N*P(X=0)\\ &=100* 0.3164\\ &=31.64. \end{aligned}

Similarly the other expected frequencies can be calculated using the formula $E_i= N*P(X=x_i)$.

The expected probabilities and expected frequencies are as follows:

x Freq.$(O_i)$ $P(X=x)$ Expe.Freq.$(E_i)$ $(O_i-E_i)^2/E_i$
0 32 0.3164 31.64 0.004
1 42 0.4219 42.19 0.001
2 20 0.2109 21.09 0.056
3 6 0.0469 4.69 0.366

Compute the test statistic as

 \begin{aligned} \chi^2_{obs}&= \sum \frac{(O_{i} -E_{i})^2}{E_{i}}\\ &=\frac{(32-31.64)^2}{31.64}+\frac{(42-42.19)^2}{42.19}+\frac{(20-21.09)^2}{21.09}+ \frac{(6-4.69)^2}{4.69}\\ &= 0.004 +0.001 +0.056+ 0.366\\ &= 0.427. \end{aligned}

#### Step 5 Critical value of Chi-square

The degrees of freedom for the chi-square test of goodness of fit is $df=n-k-1=4 - 1 -1 = 2$.

The table value of $\chi^2$ for $n-k-1$ degrees of freedom and at $\alpha$ level of significance is $\chi^2_t=\chi^2_{n-k-1,\alpha}=\chi^2_{2,0.05}=5.9915$.

The test statistic is $\chi^2_{obs} =0.427$ which falls $outside$ the critical region bounded by the critical value $5.9915$, we $\textit{fail to reject}$ the null hypothesis.

OR

#### Step 6 Decision (p-value approach)

The p-value is $P(\chi^2_{2} > 0.427) =0.80775$.

As the p-value $0.8078$ is $\textit{greater than}$ the significance level of $\alpha = 0.05$, we $\textit{fail to reject}$ the null hypothesis.

#### Interpretation

We conclude that given data fits well to the Binomial distribution.

## Chi-square test of goodness of fit Example 5

The following table contains data on number of complaints received per day at a major retail bank's branches:

No. of Complaints Frequency
0 270
1 140
2 65
3 14
4 + 5

Fit a Poisson distribution and test to see if it is consistent with the data.

#### Step 1 : Setup the null and alternative hypothesis

The null hypothesis for test of goodness of fit is

$H_0:$ The number of complaints follows a Poisson distribution.

The alternative hypothesis is

$H_1:$ The number of complaints do not follows a Poisson distribution.

#### Step 2 : Define the test statistic

We use the chi-square test of goodness of fit for testing hypothesis testing problem. The chi-square test statistic for testing above hypothesis testing problem is

 $$\begin{equation*} \chi^2= \sum \frac{(O_{i} -E_{i})^2}{E_{i}} \sim \chi^2_{(n-k-1)}\\ \end{equation*}$$

where

• $O_i=f_i$ observed frequency for $i^{th}$ category or $i^{th}$ value of $x$,
• $E_i$ expected frequencies according to the assumed distribution,
• $n$ is the number of categories after pooling,
• $k$ is the number of parameters estimated.

#### Step 3 : Sepcify the level of significance.

The level of significance $\alpha=0.05$.

#### Step 4 : Calculate the test statistic

x Freq.$(f)$ f*x
0 270 0
1 140 140
2 65 130
3 14 42
4 5 20
Total 494 332

First estimate the parameter of Poisson distribution. The population mean of Poisson distribution with parameter $\lambda$ is $E(X)=\lambda$. The sample mean of observed data is $\overline{X}=\frac{1}{N}\sum f_ix_i$. By the method of moments, the parameter $\lambda$ can be estimated as

 \begin{aligned} \hat{\lambda} &= \overline{X}\\ &=\frac{1}{\sum f_i}\sum f_ix_i\\ &=\frac{332}{494}\\ &=0.6721 \end{aligned}

The probability mass function of Poisson distribution with parameter $\lambda$ is given by

 \begin{aligned} P(X=x)&= \frac{e^{-\lambda}\lambda^x}{x!},\\ &\quad x=0,1,2,\cdots; \lambda > 0 \end{aligned}

We have the estimated value of $\lambda$ is $\hat{\lambda}=0.6721$.

The expected probabilities for different values of $x$ can be obtained on substituting $\hat{\lambda}=0.6721$ in above probability mass function.

For example, the expected probability for $x=0$ is

 \begin{aligned} P(X=0)&= \frac{e^{-0.6721}0.6721^0}{0!},\\ &=0.5106 \end{aligned}

Similarly the expected probabilities for other values of $x$ can be calculated using the probability mass function.

The expected frequency for $x=0$ can be calculated as

 \begin{aligned} E_{1} &=N*P(X=0)\\ &=494* 0.5106\\ &=252.2364. \end{aligned}

Similarly the other expected frequencies can be calculated using the formula $E_i= N*P(X=x_i)$.

The expected probabilities and expected frequencies are as follows:

x Freq.$(O_i)$ $P(X=x)$ Expe.Freq.$(E_i)$ $(O_i-E_i)^2/E_i$
0 270 0.5106 252.2364 1.251
1 140 0.3432 169.5408 5.147
2 65 0.1153 56.9582 1.135
3 14 0.0258 12.7452 0.124
4 5 0.0043 2.1242 3.893
Total 494 0.9992 493.6048 11.550

Compute the test statistic as

 \begin{aligned} \chi^2_{obs}&= \sum \frac{(O_{i} -E_{i})^2}{E_{i}}\\ &=\frac{(270-252.2364)^2}{252.2364}+\frac{(140-169.5408)^2}{169.5408}+\frac{(65-56.9582)^2}{56.9582}\\ &\quad + \frac{(14-12.75)^2}{12.75}+ \frac{(5-2.12)^2}{2.12}\\ &= 1.251 +5.147 +1.135+ 0.124+ 3.893\\ &= 11.55. \end{aligned}

#### Step 5 Critical value of Chi-square

The degrees of freedom for the chi-square test of goodness of fit is $df=n-k-1=4 - 1 -1 = 2$.

The table value of $\chi^2$ for $n-k-1$ degrees of freedom and at $\alpha$ level of significance is $\chi^2_t=\chi^2_{n-k-1,\alpha}=\chi^2_{2,0.05}=5.9915$.

The test statistic is $\chi^2_{obs} =11.55$ which falls $inside$ the critical region bounded by the critical value $5.9915$, we $\textit{reject}$ the null hypothesis.

OR

#### Step 6 Decision (p-value approach)

The p-value is $P(\chi^2_{2} > 11.55) =0.0031$.

As the p-value $0.0031$ is $\textit{less than}$ the significance level of $\alpha = 0.05$, we $\textit{reject}$ the null hypothesis.

#### Conclusion

We conclude that the number of complaints do not follows a Poisson distribution.

## Endnote

In this tutorial, you learned the chi-square test of goodness of fit. You also learned about the step by step procedure to apply chi-square test of goodness of fit. The step by step solved examples on chi-square test of goodness of fit helps you understand the chi-square test of goodness of fit. 