What is the Empirical Rule in Statistics?
Empirical Rule in Statistics states that almost all (95%) of the observations in a normal distribution lie within 3 Standard Deviations from the Mean. This is a very important rule and helps in forecasting.
The rule says that:
- 68% of the observations will lie within +/- 1 Standard Deviation from the mean
- 95% of the observations will lie within +/- 2 Standard Deviations from the mean
- 7% of the observations will lie within +/- 3 Standard Deviations from the mean
How to Use?
This is used in the forecasting trend of a data set. When the data set is large, and it gets difficult to study the entire population, then Empirical rule can be applied to the sample to get an estimation of how the data in population will react if you are being asked to find the average salary of all the accountants in the US. Then that is a difficult task to perform as the population set is huge. So, in that case, you can select say 90 observations randomly from the entire population.
So now you will have 90 salaries. You need to find the Mean and Standard Deviation of the observations. If the observation follows a normal distribution, then this can be applied, and an estimation of the salary of all accountants in the US can be made.
Say the Mean salary of the sample comes out to be $90,000. And the Standard deviation is $5,000. So out of the entire population, 68% of the accountants are drawing a salary ranging between +/- 1Standard Deviations from the Mean. As the Mean is $90,000 and the Standard Deviation is $5,000. So 68% of all the accountants in the US are being paid in the range of $90,000 +/- (1*$5,000) That is within $85,000 to $95,000
If we spread a bit more, then 95% of all the accountants in the US are being paid in the range of Mean +/- 2 Standard Deviations. $90,000 +/- (2*5000). So the range is $80,000 to $100,000.
In a broader range, 99.7% of all accountants are drawing salaries ranging from Mean +/- 3Standard Deviations. That is 90,000 +/- (3*5000). The range is $75,000 to $105,000
You can clearly see that without studying the entire population, estimation could be made regarding the population. If someone is planning to work as an accountant in the US, then he can easily expect that his salary will range from $75,000 to $105,000
This kind of estimation helps to ease work and make forecasts regarding the future.
Empirical Rule Examples
Mr. X is trying to find the average number of years a person survive after retirement, considering the retirement age to be 60. If the Mean survival years of 50 random observations are 20 years and SD is 3, then find out the probability that a person will draw a pension for more than 23 years
The empirical rule states that 68% of the observations will lie within 1 Standard Deviation from the Mean. Here the Mean of the observations is 20.
68% of the observations will lie within 20 +/- 1 (Standard Deviation), which is 20 +/- 3. So the range is 17 to 23.
There is a 68% chance that minimum years a person survives after retirement lies between 17 to 23. Now the percentage that is lying outside this range is (100 – 68) = 32%. 32 is distributed equally on both sides, which means a 16% chance that the minimum years will be below 17 and 16% chance that minimum years will be greater than 23.
So the probability that the person will draw more than 23years of pension is 16%.
Empirical Rule vs. Chebyshev’s Theorem
Empirical Rule is applied to data sets that follow a normal distribution that means bell-shaped. In a normal distribution, both sides of the distribution have a 50% probability each.
If the data set is not normally distributed, then there is another approximation or rule that applies to all types of data sets, which is Chebyshev’s Theorem. It says three things:
- At least 3/4th of all the observations will lie within 2Standard Deviations from the Mean. This is a strong approximation. It means if there are 100 observations, then 3/4th of the observations that are 75 observations will lie within +/- 2 Standard Deviations from the Mean.
- At least 8/9th of all observations will lie within 3Standard Deviations from the Mean.
- At least 1 – 1/k^2 of all the observations lie within K Standard deviations from the Mean. Here K is referred to as any whole number.
When to Use?
Data is like Gold in the modern world. There are huge data flowing from different sources and are used for different approximations or forecasts. If a data set is following a normal distribution, it means it is showing a Bell Shaped curve; then, Empirical rule can be used. This is applied to observations to create an approximation for the population.
Once it is seen that the observations are showing Normal Distribution structure, then Empirical rule is followed to find several probabilities of the observations. The rule is extremely useful for many statistical forecasts.
This is a statistical concept that is helpful to portray the probability of observations and is very useful when finding an approximation of a huge population. It should always be noted that these are approximations. There are always chances of outliers that don’t fall in the distribution. So the findings are not accurate and precautionary measures should be taken when acting as per the forecast.
This has been a guide to What is Empirical Rule & its Definition. Here we discuss the formula of empirical rule along with calculation examples. You can learn more about from the following articles –