Karl Pearson’s Coefficient of Skewness using R with examples

In this tutorial, you will learn about what is Karl Pearson's coefficient of skewness and how to calculate Karl Pearson's coefficient of skewness in R.

Karl Pearson's Coefficient of Skewness

Skewness is a very important concept in statistics and data science field. Skewness is a measure of symmetry. The meaning of skewness is "lack of symmetry". Skewness gives us an idea about the concentration of higher or lower data values around the central value of the data.

The idea behind the definition of Karl Pearson's coefficient of skewness is that, if the data is symmetric then the mean, median and mode of the data coincide with each other. That is, for symmetric data $\text{Mean} = \text{Median} =\text{Mode}$.

Obviously, if the data is not symmetric, then the three measures do not coincide with each other. That is $\text{Mean} > \text{Median} > \text{Mode}$ or $\text{Mean} < \text{Median} < \text{Mode}$.

Symmetric Distribution

For a distribution, if $\text{Mean} = \text{Median} = \text{Mode}$ then the distribution is symmetric (or not skewed).

Pearsons Symmetric Distribution
Pearsons Symmetric Distribution

Positively Skewed Distrbiution

For a distribution, if $\text{Mean} > \text{Median} > \text{Mode}$ then the distribution is positively skewed.

Pearsons positively skewed distribution
Pearsons positively skewed distribution

Negatively Skewed Distrbiution

For a distribution, if $\text{Mean} < \text{Median} < \text{Mode}$ then the distribution is negatively skewed.

Pearsons negatively skewed distribution
Pearsons negatively skewed distribution

Karl Pearson's Coefficient of Skewness

The Karl Pearson's coefficient of skewness is given by

$S_k =\dfrac{Mean-Mode}{sd}=\dfrac{\overline{x}-Mode}{s_x}$

where,

  • $\overline{x}$ is the sample mean of the data,
  • $\text{Mode}$ is the mode of the data,
  • $s_x$ is the sample standard deviation of the data.

When the distribution of data is not unimodal, the Pearson's coefficient of skewness can be calculated using the formula

$S_k =\dfrac{3(Mean-Median)}{sd}=\dfrac{3(\overline{x}-M)}{s_x}$

where,

  • $\overline{x}$ is the sample mean of the data,
  • $M$ is the median of the data,
  • $s_x$ is the sample standard deviation of the data.

When the mode is not defined, use the empirical relation Mean - Mode = 3(Mean -Median).

Karl Pearson's Coefficient of Skewness Interpretation

  • If $S_k < 0$, i.e., $\text{Mean} < \text{Mode}$ or $\text{Mean} < \text{Median}$ then the distribution is negatively skewed.
  • If $S_k = 0$, i.e., $\text{Mean} = \text{Mode}$ or $\text{Mean} = \text{Median}$ then the distribution is Symmetric or not skewed.
  • If $S_k > 0$, i.e., $\text{Mean} > \text{Mode}$ or $\text{Mean} > \text{Median}$ then the distribution is positively skewed.

Numerical Problem Karl Pearson's Skewness Using R

Example 1 : Karl Pearson's Coefficient of Skewness using R

Blood sugar level (in mg/dl) of a sample of 20 patients admitted to the hospitals are as follows:
75, 80, 72, 78, 82, 85, 73, 75, 97, 87,
84, 76, 73, 79, 99, 86, 83, 76, 78, 73.
Compute Pearson's coefficient of skewness and interprete the result.

# Create a data vector
blood_sugar <-c(75, 80, 72, 78, 82, 85, 73, 75, 97, 87,
84, 76, 73, 79, 99, 86, 83, 76, 78, 73)
# Mean of the data
Mean <- mean(blood_sugar)
# Median of the data
Median <- median(blood_sugar)
# standard deviation of the data
SD <- sd(blood_sugar)
Mean
[1] 80.55
Median
[1] 78.5
SD
[1] 7.556628

The Karl Pearson's coefficient of skewness is

$$ \begin{aligned} S_k &=\dfrac{3(Mean-Median)}{sd} \end{aligned} $$

# Pearson's coefficient of skewness
S_k <- 3 * (Mean - Median) / SD
S_k
[1] 0.813855

$$ \begin{aligned} S_k &= \frac{3(\text{Mean} - \text{Median})}{\text{SD}}\\ &=\frac{3(80.55- 78.5)}{7.5566283}\\ &=0.813855 \end{aligned} $$

The Pearson's coefficient of skewness $S_k > 0$, the distribution of Blood Sugar Level is $\text{positively skewed}$.

Example 2: Pearson's Coefficient of Skewness using R

Diastolic blood pressure (in mmHg) of a sample of 18 patients admitted to the hospitals are as follows:
65, 76, 64, 73, 74, 80, 71, 68, 66,
81, 79, 75, 70, 62, 83, 63, 77, 78.
Compute Pearson's coefficient of skewness and interprete the result.

# Create a data vector
DBP <- c(65, 76, 64, 73, 74, 80, 71, 68, 66,
81, 79, 75, 70, 62, 83, 63, 77, 78)
# Mean of the data
Mean <- mean(DBP)
# Median of the data
Median <- median(DBP)
# standard deviation of the data
SD <- sd(DBP)
Mean
[1] 72.5
Median
[1] 73.5
SD
[1] 6.653173

The Karl Pearson's coefficient of skewness is

$$ \begin{aligned} S_k &=\dfrac{3(Mean-Median)}{sd} \end{aligned} $$

# Pearson's coefficient of skewness
S_k <- 3 * (Mean - Median) / SD
S_k
[1] -0.4509127

$$ \begin{aligned} S_k &= \frac{3(\text{Mean} - \text{Median})}{\text{SD}}\\ &=\frac{3(72.5- 73.5)}{6.6531726}\\ &=-0.4509127 \end{aligned} $$

The Pearson's coefficient of skewness $S_k < 0$, the distribution of Diastolic blood pressure is $\text{negatively skewed}$.

Endnote

In this tutorial you learned about what is Karl Pearson's coefficient of skewness and how to calculate Karl Pearson's coefficient of skewness using R.

To learn more about descriptive statistics using R, please refer to the following tutorials:

Hopefully you enjoyed learning this tutorial on how to compute coefficient of skewness using R.

Leave a Comment