Chisquare test of Independence
Assumptions
 The two variables should be measured at an ordinal or nominal level.
 Each variable should consist of two or more categories. For example,
 the variable SocioEconomic Status : Low,medium and high,
 the variable gender : Male, Female
Step by Step Procedure for Chisquare test of independence
Suppose that a given population consisting of $N$ items is divided into $r$ mutually exclusive and exhaustive classes with respect to attribute $A$, say, $A_1, A_2,\cdots,A_r$ and the same population is divided into $c$ mutually exclusive and exhaustive classes with respective to attribute $B$, say,$B_1, B_2, \cdots,B_c$. Such an arrangement of $r$ rows and $c$ columns is called $r\times c$ contingency table.
$A$ / $B$  $B_1$  $B_2$  $\cdots$  $B_j$  $\cdots$  $B_c$  Total 

$A_1$  $(A_1B_1)$  $(A_1B_2)$  $\cdots$  $(A_1B_j)$  $\cdots$  $(A_1B_c)$  $(A_1)$ 
$A_2$  $(A_2B_1)$  $(A_2B_2)$  $\cdots$  $(A_2B_j)$  $\cdots$  $(A_2B_c)$  $(A_2)$ 
$\vdots$  $\vdots$  $\cdots$  $\vdots$  $\cdots$  $\vdots$  $\vdots$  
$A_i$  $(A_iB_1)$  $(A_iB_2)$  $\cdots$  $(A_iB_j)$  $\cdots$  $(A_iB_c)$  $(A_i)$ 
$\vdots$  $\vdots$  $\cdots$  $\vdots$  $\cdots$  $\vdots$  $\vdots$  
$A_r$  $(A_rB_1)$  $(A_rB_2)$  $\cdots$  $(A_rB_j)$  $\cdots$  $(A_rB_c)$  $(A_r)$ 
Total  $(B_1)$  $(B_2)$  $\cdots$  $(B_j)$  $\cdots$  $(B_c)$  $N$ 
In the above table

$(A_iB_j)$ is the number of member possessing the attribute $A_i$ and $B_j$,

$(A_i)$ is the total frequency of $i^{th}$ row i.e., attribute $A_i$ and

$(B_j)$ is the total frequency of $j^{th}$ column i.e., attribute $B_j$. And
$N=\sum_{i=1}^r (A_i) =\sum_{j=1}^c (B_j)$
.
Step 1 The null and alternative hypothesis:
To test the independence of attributes, the null hypothesis can be setup as
$H_0$ : The two attributes $A$ and $B$ are independent.
Step 2 Test statistic
In the above contingency table $(A_iB_j)$
(say, $O_{ij}$
) denote the observed frequency of attributes $A_i$ and $B_j$.
Under the null hypothesis, i.e., attributes $A$ and $B$ are independent, the expected frequency is given by
$$ \begin{aligned} E_{ij}=\frac{(A_i)(B_j)}{N},\; i=1,2, \cdots, r; j=1,2,\cdots, c. \end{aligned} $$
If two events $A$ and $B$ are independent, then we have
$$ \begin{aligned} P(A\cap B) &= P(A)\times P(B)\\ \implies \frac{n(A\cap B)}{N}&=\frac{n(A)}{N}\times \frac{n(B)}{N}\\ \implies n(A\cap B) &= \frac{n(A) \times n(B)}{N} \end{aligned} $$
where $n(A)$ is the number of elements favorable to $A$ out of $N$.
The test statistic under the null hypothesis for testing above hypothesis is
$$ \begin{aligned} \chi^2 &= \sum_{i=1}^r\sum_{j=1}^c\frac{(O_{ij}E_{ij})^2}{E_{ij}}\sim\chi^2_{(r1)(c1)}\\\nonumber & = \sum_{i=1}^r\sum_{j=1}^c\frac{O_{ij}^2}{E_{ij}}N\sim\chi^2_{(r1)(c1)}. \end{aligned} $$
where $r$ is the number of rows and $c$ is the number of columns.
The calculated value of $\chi^2$ is called $\chi^2_c$.
Step 3 Specify the Level of Significance
Step 4 Critical value of Chisquare
The table value of $\chi^2$ for $(r1)(c1)$ degrees of freedom and at $\alpha$ level of significance is $\chi^2_t=\chi^2_{(r1)(c1),\alpha}$
.
Step 5 Computation of Test Statistic
The test statistic under the null hypothesis for testing above hypothesis is
$$ \begin{aligned} \chi^2_{obs} &= \sum_{i=1}^r\sum_{j=1}^c\frac{(O_{ij}E_{ij})^2}{E_{ij}} \end{aligned} $$
Step 6 Decision (Traditional approach)
If $\chi^2_{obs}\leq \chi^2_t$, then accept $H_0$ at $\alpha$ level of significance, i.e., the two attributes are independent, other wise reject $H_0$ at $\alpha$ level of significance.
OR
Step 6 Decision (pvalue approach)
The pvalue of the test is
$$ p = P(\chi^2_{(r1)(c1)}\geq\chi^2_{obs}) $$
If $p$value of the test is less than $\alpha$, then reject the null hypothesis $H_0$ at $\alpha$ level of significance, otherwise fail to reject $H_0$ at $\alpha$ level of significance.
Chisquare test of Independence Example 1
A researcher collected data from a sample that he chose and he wishes to understand the relationship between two variables: gender and preference of public transportation. The researcher has two categories for gender (male, female) and two categories for mode of transportation (bus, train). He collects his data and perform a count off how many observations appeared in his data set. He found he has the actual counts in the table below after looking at his data set:
Gender / Transportation  Bus  Train  Total 

Male  50  30  80 
Female  40  80  120 
Total  90  110  200 
What is the Chisquare test statistics
Solution
The observed data is
Bus Train Sum
Male 50 30 80
Female 40 80 120
Sum 90 110 200
Number of rows $r=2$, number of columns $c=2$.
Step 1 The null and alternative hypothesis are as follows:
$H_0:$ The row variable (gender) and column variable (mode of transportation) are independent.
$H_1:$ The row variable (gender) and column variable (mode of transportation) are not independent (they are dependent).
Step 2 Test statistic
The test statistic for testing above hypothesis is
$$ \begin{equation*} \chi^2= \sum \sum \frac{(O_{ij} E_{ij})^2}{E_{ij}} \sim \chi^2_{(r1)(c1)}\\ \end{equation*} $$
Step 3 Level of Significance
The level of significance is $\alpha =0.05$.
Step 4 Critical value of $\chi^2$
The level of significance is $\alpha =0.05$. Degrees of freedom $df=(r1)(c1)=(21)(21) =1$.
The critical value of $\chi^2$ for $df=1$ and $\alpha=0.05$ level of significance is $\chi^2_{0.05,1} =3.8415$.
Step 5 Computation of test Statistic
The expected frequency for $(i,j)^{th}$ cell is given by
$$ \begin{equation*} E_{ij} =\frac{i^{th}\text{ row total }\times j^{th}\text{ column total}}{N} \end{equation*} $$
For example, $E_{11}$ is given by
$$ \begin{eqnarray*} E_{11} & = &\frac{1^{st}\text{ row total }\times 1^{st}\text{ column total}}{N}\\ &=& \frac{80*90}{200}\\ &=&36. \end{eqnarray*} $$
Table of Expected Frequencies:
Bus  Train  Sum  

Male  36  44  80 
Female  54  66  120 
Sum  90  110  200 
The test statistic is
$$ \begin{eqnarray*} \chi^2_{obs}&=& \sum \sum \frac{(O_{ij} E_{ij})^2}{E_{ij}} \sim \chi^2_{(r1)(c1)}\\ &=&\frac{(5036)^2}{36}+\cdots + \frac{(8066)^2}{66}\\ &=& 16.4983. \end{eqnarray*} $$
Step 6 Decision (Traditional approach)
The test statistic is $\chi^2_{obs} =16.4983$
which falls $inside$ the critical region bonded by the critical value $\chi^2_{0.05,1}=3.8415$
, we $\textit{reject}$ the null hypothesis.
OR
Step 6 Decision ($p$value approach)
The pvalue is $P(\chi^2_{1}>16.4983) =0.00005$
.
As the pvalue $0.00005$ is $\textit{less than}$ the significance level of $\alpha = 0.05$, we $\textit{reject}$ the null hypothesis.
Interpretation
That is row variable (gender) and column variable (mode of transportation) are not independent (they are dependent).
Chisquare test of Independence Example 2
The National Sleep Foundation used a survey to determine whether hours of sleeping per night are independent of age (Newsweek, January 19, 2004). The following show the hours of sleep on weeknights for a sample of individuals age 49 and younger and for a sample of individuals age 50 and older.
Age / Hours of sleep  Less than 6  6 to 6.9  7 to 7.9  8 or more 

49 or younger  33  61  71  75 
50 or older  32  61  70  97 
Conduct a test of independence to determine whether the hours of sleep on weeknights are independent of age. Use $\alpha = .05$.
Compute the value of the test statistic and the pvalue.
Solution
The observed data is
Fewer than 6 6 to 6.9 7 to 7.9 8 or more Sum
49 or younger 33 61 71 75 240
50 or older 32 61 70 97 260
Sum 65 122 141 172 500
Number of rows $r=2$, number of columns $c=4$.
Step 1 The null and alternative hypothesis are as follows:
$H_0:$ The row variable (gender) and column variable (mode of transportation) are independent.
$H_1:$ The row variable (gender) and column variable (mode of transportation) are not independent (they are dependent).
Step 2 Test statistic
The test statistic for testing above hypothesis is
$$ \begin{equation*} \chi^2= \sum \sum \frac{(O_{ij} E_{ij})^2}{E_{ij}} \sim \chi^2_{(r1)(c1)}\\ \end{equation*} $$
Step 3 Level of Significance
The level of significance is $\alpha =0.05$.
Step 4 Critical value of $\chi^2$
The level of significance is $\alpha =0.05$. Degrees of freedom $df=(r1)(c1)=(21)(41) =3$.
The critical value of $\chi^2$ for $df=3$ and $\alpha=0.05$ level of significance is $\chi^2_{0.05,3} =7.8147$.
Step 5 Computation of test Statistic
The expected frequency for $(i,j)^{th}$ cell is given by
$$ \begin{equation*} E_{ij} =\frac{i^{th}\text{ row total }\times j^{th}\text{ column total}}{N} \end{equation*} $$
For example, $E_{11}$
is given by
$$ \begin{aligned} E_{11} & = \frac{1^{st}\text{ row total }\times 1^{st}\text{ column total}}{N}\\ &= \frac{240*65}{500}\\ &=31.2. \end{aligned} $$
Similarly one can find the other expected frequencies.
Table of Expected Frequencies:
Fewer than 6  6 to 6.9  7 to 7.9  8 or more  Sum  

49 or younger  31.2  58.56  67.68  82.56  240 
50 or older  33.8  63.44  73.32  89.44  260 
Sum  65.0  122.00  141.00  172.00  500 
The test statistic is
$$ \begin{aligned} \chi^2_{obs}&= \sum \sum \frac{(O_{ij} E_{ij})^2}{E_{ij}} \sim \chi^2_{(r1)(c1)}\\ &=\frac{(3331.2)^2}{31.2}+\cdots + \frac{(9789.44)^2}{89.44}\\ &= 2.0397. \end{aligned} $$
Step 6 Decision (Traditional approach)
The test statistic is $\chi^2_{obs} =2.0397$
which falls $outside$ the critical region bonded by the critical value $\chi^2_{0.05,3}=7.8147$
, we $\textit{fail to reject}$ the null hypothesis.
OR
Step 6 Decision ($p$value approach)
The pvalue is $P(\chi^2_{3}>2.0397) =0.56421$.
As the pvalue $0.56421$ is $\textit{greater than}$ the significance level of $\alpha = 0.05$, we $\textit{fail to reject}$ the null hypothesis.
Interpretation
We conclude the hours of sleep on weeknights are independent of age.
Chisquare test of Independence Example 3
Response to a survey question are broken down according to employment status and the sample results are given below. At the 0.10 significance level, test the claim that response and employment status are independent.
.  Yes  No  Undecided 

Employment  30  15  5 
Unemployment  20  25  10 
Solution
The observed data is
Yes No Undecided Sum
Employment 30 15 5 50
Unemployment 20 25 10 55
Sum 50 40 15 105
Number of rows $r=2$, number of columns $c=3$.
Step 1 The null and alternative hypothesis are as follows:
$H_0:$ The row variable (gender) and column variable (mode of transportation) are independent.
$H_1:$ The row variable (gender) and column variable (mode of transportation) are not independent (they are dependent).
Step 2 Test statistic
The test statistic for testing above hypothesis is
$$ \begin{equation*} \chi^2= \sum \sum \frac{(O_{ij} E_{ij})^2}{E_{ij}} \sim \chi^2_{(r1)(c1)}\\ \end{equation*} $$
Step 3 Level of Significance
The level of significance is $\alpha =0.1$.
Step 4 Critical value of $\chi^2$
The level of significance is $\alpha =0.1$. Degrees of freedom $df=(r1)(c1)=(21)(31) =2$.
The critical value of $\chi^2$ for $df=2$ and $\alpha=0.1$ level of significance is $\chi^2_{0.1,2} =4.6052$.
Step 5 Computation of test Statistic
The expected frequency for $(i,j)^{th}$ cell is given by
$$ \begin{equation*} E_{ij} =\frac{i^{th}\text{ row total }\times j^{th}\text{ column total}}{N} \end{equation*} $$
For example, $E_{11}$
is given by
$$ \begin{aligned} E_{11} & = \frac{1^{st}\text{ row total }\times 1^{st}\text{ column total}}{N}\\ &= \frac{50*50}{105}\\ &=23.81. \end{aligned} $$
Similarly one can determine the other expected frequencies.
Table of Expected Frequencies:
Yes  No  Undecided  Sum  

Employment  23.80952  19.04762  7.142857  50 
Unemployment  26.19048  20.95238  7.857143  55 
Sum  50.00000  40.00000  15.000000  105 
The test statistic is
$$ \begin{aligned} \chi^2_{obs}&= \sum \sum \frac{(O_{ij} E_{ij})^2}{E_{ij}} \sim \chi^2_{(r1)(c1)}\\ &=\frac{(3023.81)^2}{23.81}+\cdots + \frac{(107.86)^2}{7.86}\\ &= 5.942. \end{aligned} $$
Step 6 Decision (Traditional approach)
The test statistic is $\chi^2_{obs} =5.942$
which falls $inside$ the critical region bonded by the critical value $\chi^2_{0.1,2}=4.6052$
, we $\textit{reject}$ the null hypothesis.
OR
Step 6 Decision ($p$value approach)
The pvalue is $P(\chi^2_{2}>5.942) =0.05125$.
As the pvalue $0.05125$ is $\textit{less than}$ the significance level of $\alpha = 0.1$, we $\textit{reject}$ the null hypothesis.
Interpretation
We conclude that response and employment status are dependent.
Endnote
In this tutorial, you learned the chisquare test of independence. You also learned about the step by step procedure to apply chisquare test of independence and step by step solved examples on chisquare test of independence.
To learn more about other parametric and nonparametric test please refer to the following tutorials:
Let me know in the comments if you have any questions on chisquare test of independence and your thought on this article.