Chi-Square Test with Excel
Chi-Square test in excel is the most commonly used non-parametric test used to compare two or more variables for randomly selected data. It is a type of test which is used to find out the relationship between two or more variables, this is used in statistics which is also known as Chi-Square P-value, in excel we do not have an inbuilt function but we can use formulas to perform chi-square test in excel by using the mathematical formula for Chi-Square Test.
- Chi-Square test for goodness of fit
- Chi-Square test for independence of two variables.
#1 – Chi-Square test for goodness of fit
It is used to perceive the proximity of a sample that suits a population. The symbol of the Chi-Square test is (2). It is the sum of all the (Observed count – Expected count)2/ Expected count.
- where k-1 degrees of freedom or DF.
- Where Oi is the observed frequency, k is categories, and Ei is the expected frequency.
Note:- Goodness of fit of a statistical model refers to the understanding of how well sample data fits a set of observations.
- The creditworthiness of borrowers based on their age groups and personal loans
- The relation between the performance of salesmen and training received
- Return on a single stock and on stocks of a sector like pharmaceutical or banking
- Category of viewers and the impact of a TV campaign.
#2 – Chi-Square test for independence of two variables
It is utilized to check whether the variables are autonomous of one another or not. With (r-1) (c-1) degrees of freedom
Where Oi is the observed frequency, r is number of rows, c is the number of columns, and Ei is the expected frequency
Test of independence is suitable for the following situations:
- There is one categorical variable.
- There are two categorical variables, and you will need to determine the relation between them.
- There are cross-tabulations, and the relation between two categorical variables needs to be found.
- There are non-quantifiable variables (For example, answers to questions like, do employees in different age groups choose different types of health plans?)
How to Do the Chi-Square Test in Excel? (with Example)
The manager of a restaurant wants to find the relation between customer satisfaction and the salaries of the people waiting for tables. In this, we will set up the hypothesis to test the Chi-Square
- She takes a random sample of 100 customers asking if the service was excellent, good, or poor.
- She then categorizes the salaries of the people waiting as low, medium, and high.
- Assume the level of significance is 0.05. Here, H0 and H1 denote the independence and dependence of the service quality on the salaries of people waiting tables.
- H0 – service quality is not dependent on the salaries of people waiting for the tables.
- H1 – service quality is dependent on the salaries of people waiting for the tables
- Her findings are shown in the table below:
In this, we have 9 data points we have 3 groups each of which got a different message about salary and the outcome is given in the below.
Now we are going to count the sum of all the rows and columns. We will do this with the help of formula i.e. SUM. To Total the Excellent in the total column we have written =SUM(B4:D4) and then press enter.
This will give us 26. We will perform the same with all the rows and columns.
To calculate the Degree of Freedom ( DF ) we use (r-1)(c-1)
DF = (3-1)(3-1)=2*2=4
- There are 3 categories of service and 3 categories of Salary
- We have 27 respondents with a Medium salary(bottom row, middle)
- We have 51 respondents with a Good service (last column, middle)
Now we have to calculate the Expected Frequencies:-
Expected Frequencies can be calculated using a formula:-
- To calculate for the Excellent we will use multiplying the total of Low with the total of Excellent divided by N.
Suppose we have to calculate for the 1st row and 1st column (=B7*E4/B9). This will give the expected number of customers who have voted Excellent service for the salaries of the people waiting as low i.e. 8.32.
- E11=(32*26)/100 = 8.32, E12 = 7.02, E13 = 10.66
- E21 = 16.32, E22 = 13.77, E23 = 20.91
- E31 = 7.36, E32 = 6.21, E33 = 9.41
Similarly for all, we have to do the same and the formula is applied in the below diagram.
We get the Expected Frequency table as given below:-
Note:- Assume the level of significance is 0.05. Here, H0 and H1 denote the independence and dependence of the service quality on the salaries of people waiting tables.
After calculating the Expected Frequency we will calculate the chi-square data points by using a formula
Chi-Square Points = (Observed-Expected)^2/Expected
To calculate the first point we write =(B4-B14)^2/B14.
We will copy and paste the formula into other cells to fill the value automatically.
After this, we will calculate the chi-value(Calculated value) by adding all the values given above the table
We got the Chi-value as 18.65823.
To calculate the critical value for this we use a chi-square critical value table of we can use the formula given below.
This formula contains 2 parameters CHISQ.INV.RT(probability, degree of freedom).
Probability is 0.05, it’s a significant value that will help us to determine whether to accept the Null Hypothesis(H0) or not.
The critical value of chi-square is 9.487729037.
Now we will find the value of the chi-square or (P value)= CHITEST(actual_range,expected_range)
Range from =CHITEST(B4:D6,B14:D16).
As we have seen that the value of the chi-test or P-value is= 0.00091723.
As we have seen that we have calculated all the values. The chi-square(Calculated value) values are only significant when its value is the same or more than the critical value 9.48, i.e. critical value(Tabulated value) must be higher than the 18.65 to accept the Null Hypothesis(H0).
But here Calculated value > Tabulated value
X2 (Calculated)> X2(Tabulated)
In this case, we will reject the Null Hypothesis(H0) and Alternate(H1) will be accepted.
- We can also use P-Value to predict the same i.e. if P-value <= α (significant value 0.05), Null hypothesis will be rejected
- If the P-value > α, do not reject the null hypothesis.
Here P-value( 0.0009172) < α (0.05), reject H0, accept H1
From the above example, we conclude that Service quality is dependent on the salaries of the people waiting.
Things to Remember
- Considers the square of a standard normal variate.
- Evaluates if frequencies observed in different categories vary significantly from the frequencies expected under a specified set of assumptions.
- Determines how well an assumed distribution fits the data.
- Uses contingency tables (in market researches, these tables are called cross-tabs).
- Supports nominal-level measurements.
This has been a guide to Chi-Square Test in Excel. Here we learn how to use the chi-square test to compare two or more variables along with examples and explanations. Below are some useful excel articles related to excel-