Karl Pearson's Correlation Coefficient
Let $(x_i, y_i), i=1,2, \cdots , n$
be $n$ pairs of observations then the Karl Pearson's coefficient of correlation between two variables $X$ and $Y$ is denoted by $r_{xy}$
or $r$ and is given by
$r = \dfrac{Cov(X,Y)}{\sqrt{Var(X) Var(Y)}}$
where
- the sample covariance between $x$ and $y$ is
$$ \begin{aligned} Cov(x,y) =s_{xy}&=\frac{1}{n-1}\sum_{i=1}^{n}(x_i -\overline{x})(y_i-\overline{y})\\ &= \frac{1}{n-1}\bigg(\sum_{i=1}^n x_iy_i - \frac{(\sum_{i=1}^n x_i)(\sum_{i=1}^n y_i)}{n}\bigg) \end{aligned} $$
- the sample variance of $x$ is
$$ \begin{aligned} V(x) =s_{x}^2 &=\frac{1}{n-1}\sum_{i=1}^{n}(x_i -\overline{x})^2\\ &= \frac{1}{n-1}\bigg(\sum_{i=1}^n x_i^2 - \frac{(\sum_{i=1}^n x_i)^2}{n}\bigg) \end{aligned} $$
- the sample variance of $y$ is
$$ \begin{aligned} V(y) =s_{y}^2 &=\frac{1}{n-1}\sum_{i=1}^{n}(y_i -\overline{y})^2\\ &= \frac{1}{n-1}\bigg(\sum_{i=1}^n y_i^2 - \frac{(\sum_{i=1}^n y_i)^2}{n}\bigg) \end{aligned} $$
- the sample mean of $x$ is
$$ \begin{aligned} \overline{x}&=\frac{1}{n}\sum_{i=1}^n x_i \end{aligned} $$
- the sample mean of $y$ is
$$ \begin{aligned} \overline{y}&=\frac{1}{n}\sum_{i=1}^n y_i \end{aligned} $$
Thus,
$$ \begin{aligned} r_{xy}&=\dfrac{s_{xy}}{s_x\cdot s_y} \end{aligned} $$
The correlation coefficient $r$ can not exceed unity numerically. i.e. $|r|\leq 1 \implies -1 \leq r \leq +1$.
Two independent variables are uncorrelated. But the converse is not necessarily true.
Interpretation
- If $r = 0$, then there is no correlation between the ranks.
- If $r > 0$, then there is a positive correlation between the ranks.
- If $r = 1$, then there is a perfect positive correlation between the ranks.
- If $0 < r < 1$, then there is a partially positive correlation between the ranks.
- If $r < 0$, then there is a negative correlation between the ranks.
- If $r = -1$, then there is a perfect negative correlation between the ranks.
- If $-1 < r < 0$, then there is a partially negative correlation between the ranks.
Karl Pearson's Correlation Coefficient Calculator
Use this calculator to calculate the Karl Pearson's correlation coefficient.
Pearson's Correlation Coefficient Calculator | ||
---|---|---|
Data 1 : X | Data 2 : Y | |
Enter Data (Separated by comma ,) | ||
Results | ||
Number of Observations (n): | ||
Variance of X: | ||
Variance of Y: | ||
Covariance between X and Y: | ||
Pearson's Coefficient of Correlation: $r$ | ||
Coefficient of Determination: $r^2$ | ||
How to calculate Pearson's Correlation Coefficient?
Step 1 - Enter the $X$ values separated by commas
Step 2 - Enter the $Y$ values separated by commas
Step 3 - Click calculate button to calculate correlation coefficient
Step 4 - Gives the number of pairs of observations
Step 5 - Gives the sample variance of $X$
Step 6 - Gives the sample variance of $Y$
Step 7 - Gives the sample covariance between $X$ and $Y$
Step 8 - Gives the sample Pearson's correlation coefficient and coefficient of determination.
Pearson's Correlation Coefficient Example 1
A study was conducted to analyze the relationship between advertising expenditure and sales. The following data were recorded:
X Advertising (in \$) | 20 | 24 | 30 | 32 | 35 |
---|---|---|---|---|---|
Y Sales (in \$) | 310 | 340 | 400 | 420 | 490 |
Compute the correlation coefficient between advertising expenditure and sales.
Solution
Let $x$ denote the advertising expenditure and $y$ denote the sales.
$x$ | $y$ | $x^2$ | $y^2$ | $xy$ | |
---|---|---|---|---|---|
1 | 20 | 310 | 400 | 96100 | 6200 |
2 | 24 | 340 | 576 | 115600 | 8160 |
3 | 30 | 400 | 900 | 160000 | 12000 |
4 | 32 | 420 | 1024 | 176400 | 13440 |
5 | 35 | 490 | 1225 | 240100 | 17150 |
Total | 141 | 1960 | 4125 | 788200 | 56950 |
The sample variance of $x$ is
$$ \begin{aligned} s_{x}^2 & = \frac{1}{n-1}\bigg(\sum x^2 - \frac{(\sum x)^2}{n}\bigg)\\ & = \frac{1}{5-1}\bigg(4125-\frac{(141)^2}{5}\bigg)\\ &= \frac{1}{4}\bigg(4125-\frac{19881}{5}\bigg)\\ &= \frac{1}{4}\bigg(4125-3976.2\bigg)\\ &= \frac{148.8}{4}\\ &= 37.2. \end{aligned} $$
The sample variance of $x$ is
$$ \begin{aligned} s_{y}^2 & = \frac{1}{n-1}\bigg(\sum y^2 - \frac{(\sum y)^2}{n}\bigg)\\ & = \frac{1}{5-1}\bigg(788200-\frac{(1960)^2}{5}\bigg)\\ &= \frac{1}{4}\bigg(788200-\frac{3841600}{5}\bigg)\\ &= \frac{1}{4}\bigg(788200-768320\bigg)\\ &= \frac{19880}{4}\\ &= 4970. \end{aligned} $$
The sample covariance between $x$ and $y$ is
$$ \begin{aligned} s_{xy} & = \frac{1}{n-1}\bigg(\sum xy - \frac{(\sum x)(\sum y)}{n}\bigg)\\ & = \frac{1}{5-1}\bigg(56950-\frac{(141)(1960)}{5}\bigg)\\ &= \frac{1}{4}\bigg(56950-\frac{276360}{5}\bigg)\\ &= \frac{1}{4}\bigg(56950-55272\bigg)\\ &= \frac{1678}{4}\\ &= 419.5. \end{aligned} $$
The Karl Pearson's sample correlation coefficient between advertising expenditure and sales is
$$ \begin{aligned} r_{xy} & = \frac{Cov(x,y)}{\sqrt{V(x) V(y)}}\\ &= \frac{s_{xy}}{\sqrt{s_x^2s_y^2}}\\ &=\frac{419.5}{\sqrt{37.2\times 4970}}\\ &=\frac{419.5}{\sqrt{184884}}\\ &=0.9756. \end{aligned} $$
The correlation coefficient between advertising expenditure and sales is $0.9756$. Since the value of correlation coefficient is positive, there is a strong positive relationship between advertising expenditure and sales.
Pearson's Correlation Coefficient Example 2
A study of the amount of rainfall and the quantity of air pollution removed produced the following data:
Daily Rainfall (0.01cm) | 4.3 | 4.5 | 5.9 | 5.6 | 6.1 | 5.2 | 3.8 | 2.1 | 7.5 |
---|---|---|---|---|---|---|---|---|---|
Particulate Removed ($\mu g/m^3$) | 126 | 121 | 116 | 118 | 114 | 118 | 132 | 141 | 108 |
Calculate correlation coefficient between daily rainfall and particulate removed,
Solution
Let $x$ denote the daily rainfall (0.01 cm) and $y$ denote the particulate removed ($\mu g/m^3$).
Let $x$ denote the daily rainfall and $y$ denote the particulate removed.
$x$ | $y$ | $x^2$ | $y^2$ | $xy$ | |
---|---|---|---|---|---|
1 | 4.3 | 126 | 18.49 | 15876 | 541.8 |
2 | 4.5 | 121 | 20.25 | 14641 | 544.5 |
3 | 5.9 | 116 | 34.81 | 13456 | 684.4 |
4 | 5.6 | 118 | 31.36 | 13924 | 660.8 |
5 | 6.1 | 114 | 37.21 | 12996 | 695.4 |
6 | 5.2 | 118 | 27.04 | 13924 | 613.6 |
7 | 3.8 | 132 | 14.44 | 17424 | 501.6 |
8 | 2.1 | 141 | 4.41 | 19881 | 296.1 |
9 | 7.5 | 108 | 56.25 | 11664 | 810.0 |
Total | 45.0 | 1094 | 244.26 | 133786 | 5348.2 |
The sample variance of $x$ is
$$ \begin{aligned} s_{x}^2 & = \frac{1}{n-1}\bigg(\sum x^2 - \frac{(\sum x)^2}{n}\bigg)\\ & = \frac{1}{9-1}\bigg(244.26-\frac{(45)^2}{9}\bigg)\\ &= \frac{1}{8}\bigg(244.26-\frac{2025}{9}\bigg)\\ &= \frac{1}{8}\bigg(244.26-225\bigg)\\ &= \frac{19.26}{8}\\ &= 2.4075. \end{aligned} $$
The sample variance of $x$ is
$$ \begin{aligned} s_{y}^2 & = \frac{1}{n-1}\bigg(\sum y^2 - \frac{(\sum y)^2}{n}\bigg)\\ & = \frac{1}{9-1}\bigg(133786-\frac{(1094)^2}{9}\bigg)\\ &= \frac{1}{8}\bigg(133786-\frac{1196836}{9}\bigg)\\ &= \frac{1}{8}\bigg(133786-132981.7778\bigg)\\ &= \frac{804.2222}{8}\\ &= 100.5278. \end{aligned} $$
The sample covariance between $x$ and $y$ is
$$ \begin{aligned} s_{xy} & = \frac{1}{n-1}\bigg(\sum xy - \frac{(\sum x)(\sum y)}{n}\bigg)\\ & = \frac{1}{9-1}\bigg(5348.2-\frac{(45)(1094)}{9}\bigg)\\ &= \frac{1}{8}\bigg(5348.2-\frac{49230}{9}\bigg)\\ &= \frac{1}{8}\bigg(5348.2-5470\bigg)\\ &= \frac{-121.8}{8}\\ &= -15.225. \end{aligned} $$
The Karl Pearson's sample correlation coefficient between daily rainfall and particulate removed is
$$ \begin{aligned} r_{xy} & = \frac{Cov(x,y)}{\sqrt{V(x) V(y)}}\\ &= \frac{s_{xy}}{\sqrt{s_x^2s_y^2}}\\ &=\frac{-15.225}{\sqrt{2.4075\times 100.5278}}\\ &=\frac{-15.225}{\sqrt{242.0207}}\\ &=-0.9787. \end{aligned} $$
The correlation coefficient between daily rainfall and particulate removed is $-0.9787$. Since the value of correlation coefficient is negative, there is a strong negative relationship between daily rainfall and particulate removed.
Pearson's Correlation Coefficient Example 3
The number of hours 14 students spent studying for a test and their test on that scores are recorded as follows:
Hours spent ($x$) | Test Scores ($Y$) |
---|---|
1 | 41 |
0 | 40 |
1 | 39 |
2 | 48 |
2 | 52 |
3 | 47 |
3 | 49 |
5 | 53 |
6 | 65 |
6 | 70 |
5 | 63 |
7 | 80 |
7 | 87 |
8 | 94 |
Calculate correlation coefficient between hours spent and test scores.
Solution
Let $x$ denote the number of hours hours spent studying and $y$ denote the test scores.
Let $x$ denote the no. of hours spent studying for a test and $y$ denote the test scores.
$x$ | $y$ | $x^2$ | $y^2$ | $xy$ | |
---|---|---|---|---|---|
1 | 1 | 41 | 1 | 1681 | 41 |
2 | 0 | 40 | 0 | 1600 | 0 |
3 | 1 | 39 | 1 | 1521 | 39 |
4 | 2 | 48 | 4 | 2304 | 96 |
5 | 2 | 52 | 4 | 2704 | 104 |
6 | 3 | 47 | 9 | 2209 | 141 |
7 | 3 | 49 | 9 | 2401 | 147 |
8 | 5 | 53 | 25 | 2809 | 265 |
9 | 6 | 65 | 36 | 4225 | 390 |
10 | 6 | 70 | 36 | 4900 | 420 |
11 | 5 | 63 | 25 | 3969 | 315 |
12 | 7 | 80 | 49 | 6400 | 560 |
13 | 7 | 87 | 49 | 7569 | 609 |
14 | 8 | 94 | 64 | 8836 | 752 |
Total | 56 | 828 | 312 | 53128 | 3879 |
The sample variance of $x$ is
$$ \begin{aligned} s_{x}^2 & = \frac{1}{n-1}\bigg(\sum x^2 - \frac{(\sum x)^2}{n}\bigg)\\ & = \frac{1}{14-1}\bigg(312-\frac{(56)^2}{14}\bigg)\\ &= \frac{1}{13}\bigg(312-\frac{3136}{14}\bigg)\\ &= \frac{1}{13}\bigg(312-224\bigg)\\ &= \frac{88}{13}\\ &= 6.7692. \end{aligned} $$
The sample variance of $x$ is
$$ \begin{aligned} s_{y}^2 & = \frac{1}{n-1}\bigg(\sum y^2 - \frac{(\sum y)^2}{n}\bigg)\\ & = \frac{1}{14-1}\bigg(53128-\frac{(828)^2}{14}\bigg)\\ &= \frac{1}{13}\bigg(53128-\frac{685584}{14}\bigg)\\ &= \frac{1}{13}\bigg(53128-48970.2857\bigg)\\ &= \frac{4157.7143}{13}\\ &= 319.8242. \end{aligned} $$
The sample covariance between $x$ and $y$ is
$$ \begin{aligned} s_{xy} & = \frac{1}{n-1}\bigg(\sum xy - \frac{(\sum x)(\sum y)}{n}\bigg)\\ & = \frac{1}{14-1}\bigg(3879-\frac{(56)(828)}{14}\bigg)\\ &= \frac{1}{13}\bigg(3879-\frac{46368}{14}\bigg)\\ &= \frac{1}{13}\bigg(3879-3312\bigg)\\ &= \frac{567}{13}\\ &= 43.6154. \end{aligned} $$
The Karl Pearson's sample correlation coefficient between no. of hours spent studying for a test and test scores is
$$ \begin{aligned} r_{xy} & = \frac{Cov(x,y)}{\sqrt{V(x) V(y)}}\\ &= \frac{s_{xy}}{\sqrt{s_x^2s_y^2}}\\ &=\frac{43.6154}{\sqrt{6.7692\times 319.8242}}\\ &=\frac{43.6154}{\sqrt{2164.954}}\\ &=0.9374. \end{aligned} $$
The correlation coefficient between no. of hours spent studying for a test and test scores is $0.9374$. Since the value of correlation coefficient is positive, there is a strong positive relationship between no. of hours spent studying for a test and test scores.
Conclusion
In this tutorial, you learned about the step by step procedure for calculating Pearson's correlation coefficient. You also learned about how to interpret the correlation coefficient.
To learn more about other correlation and regression, please refer to the following tutorials:
Let me know in the comments if you have any questions on Pearson's correlation coefficient calculator with examples and your thought on this article.
v