Chi-Square Test in Excel
The chi-square test is a non-parametric test that compares two or more variables from randomly selected data. It helps find the relationship between two or more variables. In Excel, we calculate the chi-square p-value. Since Excel does not have an inbuilt function, mathematical formulas are used to perform the chi-square test.
Table of contents
- Chi-Square Test in Excel
- #1 – Chi-Square Goodness of Fit Test
- #2 – Chi-Square Test for Independence
- The Characteristics of the Chi-Square Test
- How to Perform the Chi-Square Test in Excel? (With Example)
- Frequently Asked Questions
- Recommended Articles
There are two types of chi-square tests which are listed as follows:
- Chi-square goodness of fit test
- Chi-square test for independence
#1 – Chi-Square Goodness of Fit Test
The goodness of fit test helps determine whether the sample data matches the population or not. In other words, it shows how well the sample data fits a set of observations.
The symbol of the chi-square test is “x2” (i.e., “x” raised to the power 2). “x2” is the summation of the (observed count–expected count)2/expected count.
The formula of the chi-square goodness of fit test is given as follows:
- “x2” is the chi-square statistic
- “Oi” is the observed frequency
- “Ei” is the expected frequency
- “i” is the “ith” position in the contingency table
- “k” is the category
- Degrees of freedom (df)=k-1
The Uses of the Goodness of Fit Test
The test is used in the following situations:
- To evaluate the creditworthinessCreditworthinessCreditworthiness is a measure of judging the loan repayment history of borrowers to ascertain their worth as a debtor who should be extended a future credit or not. For instance, a defaulter’s creditworthiness is not very promising, so the lenders may avoid such a debtor out of the fear of losing their money. Creditworthiness applies to people, sovereign states, securities, and other entities whereby the creditors will analyze your creditworthiness before getting a new loan. of borrowers based on their age group and history of debt obligations
- To establish a relation between the performance of sales representatives and the training received by them
- To compare the returns of a single stock with the returns on stocks of the whole sector
- To evaluate the impact of a television campaign on a category of viewers
#2 – Chi-Square Test for Independence
It helps determine whether the variables are independent of one another or not. Two random variables are called independent if the probability distributionProbability DistributionProbability distribution could be defined as the table or equations showing respective probabilities of different possible outcomes of a defined event or scenario. In simple words, its calculation shows the possible outcome of an event with the relative possibility of occurrence or non-occurrence as required. of one variable is not affected by the other.
The formula of the chi-square test for independence is given as follows:
- “x2” is the chi-square statistic
- “Oij” is the observed frequency in the ith row and jth column
- “Eij”is the expected frequency in the ith row and jth column
- “r” is the number of rows
- “c” is the number of columns
- Degrees of freedom (df)=(r-1)(c-1)
The formula for calculating the expected frequency in the ith row and jth column is given as follows:
The Uses of the Chi-Square Test for Independence
The test is used in the following situations:
- There are two categorical variables and the relation between them is to be determined.
- There are cross-tabulations (two-way tables) and the relationship between various categorical variables needs to be studied.
- There are non-quantifiable variables. For example, one needs to determine the reason behind variations of health plans across age groups.
The Characteristics of the Chi-Square Test
The features of the chi-square test are listed as follows:
- It evaluates whether observed frequencies vary significantly from the expected frequencies under a given set of assumptions.
- It determines how well an assumed distribution fits the data.
- It uses the contingency tables (or crosstabs) to summarize the relationship between various categorical variables.
- It supports nominal level measurements.
Note: In the simplest form, the chi-square distribution is the square of the standard normal distribution.
How to Perform the Chi-Square Test in Excel? (With Example)
A restaurant manager wants to find the relationship between quality of service and the salary of customers waiting to be served.
She organizes the task in the following way:
- A random sample of 100 customers is considered.
- Every customer is asked to rate the service of the restaurant as “excellent,” “good,” and “poor.”
She constructs the following hypothesis:
- Null hypothesisNull HypothesisNull hypothesis presumes that the sampled data and the population data have no difference or in simple words, it presumes that the claim made by the person on the data or population is the absolute truth and is always right. So, even if a sample is taken from the population, the result received from the study of the sample will come the same as the assumption. (H0)–The quality of service is not dependent on the salary of customers waiting to be served.
- Alternative hypothesis (H1)–The quality of service is dependent on the salary of customers waiting to be served.
The manager divides the customers into three categories based on their salaries–“low,” “medium,” and “high.” The level of significance (α) is 0.05.
The findings are presented as nine data points shown in the following table.
Let us calculate the sum of all the rows and columns. We apply the following SUM formula to add the numbers of the fourth row.
Press the “Enter” key and the sum appears in cell E4. The output is 26.
Similarly, we apply the SUM formulaSUM FormulaThe SUM function in excel adds the numerical values in a range of cells. Being categorized under the Math and Trigonometry function, it is entered by typing “=SUM” followed by the values to be summed. The values supplied to the function can be numbers, cell references or ranges. to the remaining rows and columnsRows And ColumnsA cell is the intersection of rows and columns. Rows and columns make the software that is called excel. The area of excel worksheet is divided into rows and columns and at any point in time, if we want to refer a particular location of this area, we need to refer a cell.. There are 27 respondents with medium salary and 51 respondents who rated the service quality as “good.”
We apply the formula “(r-1)(c-1)” to calculate the degrees of freedom (df).
We apply the following formula to calculate the expected frequency for column B and row 4.
The calculation is shown in the following image.
The expected number of customers who have “low” salary but rated the restaurant service as “excellent” is 8.32.
In the following calculations, E11 is the expected frequency of the first row and the first column. E12 is the expected frequency of the first row and the second column.
- E11=(26*32)/100=8.32, E12=7.02, E13=10.66
- E21=16.32, E22=13.77, E23=20.91
- E31=7.36, E32=6.21, E33=9.43
Similarly, we calculate the expected frequencies for the entire table, as shown in the succeeding image.
Let us calculate the chi-square data points by using the following formula.
We apply the formula “=(B4-B14)^2/B14” to calculate the first chi-square point.
We copy and paste the formula to the remaining cells. This is done to fill values in the entire table, as shown in the following image.
Let us calculate the chi-square calculated value by adding all the values given in the succeeding table.
The chi-square calculated value is 18.65823.
To calculate the critical value, we use either the chi-square critical value table or the CHISQ formula. The formula “CHISQ.INV.RT” contains two parameters–the probability and the degrees of freedomDegrees Of FreedomDegrees of freedom (df) refers to the number of independent values (variable) in a data sample used to find the missing piece of information (fixed) without violating any constraints imposed in a dynamic system. These nominal values have the freedom to vary, making it easier for users to find the unknown or missing value in a dataset..
The probability is 0.05, which is a significant value. The df is equal to 4.
The chi-square critical value is 9.487729037.
Let us find the chi-square p-value with the help of the following formula.
We apply the formula “=CHITEST(B4:D6,B14:D16).”
The chi-square p-value is= 0.00091723.
The chi-square calculated value is significant when equal to or more than the chi-square critical value (tabulated value). The null hypothesis (H0) is rejected if the chi-square calculated value is greater than the chi-square critical value.
Here x2 (calculated)>x2 (tabulated) or 18.65>9.48. Hence, we reject the null hypothesis and accept the alternative hypothesis.
The p-valueP-valueP-Value, or Probability Value, is the deciding factor on the null hypothesis for the probability of an assumed result to be true, being accepted or rejected, & acceptance of an alternative result in case of the assumed results rejection. can also determine whether the null hypothesis must be accepted or rejected. For this, the p-value is compared with alpha (α) in the following way:
- If p-value <= α, the null hypothesis is rejected.
- If p-value > α, the null hypothesis is accepted.
In this example, p-value<α or 0.0009172<0.05. So, we reject H0 and accept H1.
We conclude that the quality of service is dependent on the salary of customers waiting to be served.
Frequently Asked Questions
The “x2” in the goodness of fit test determines how well the sample data matches the characteristics of the larger population. If the sample data does not match the expected properties of the population, this sample is not used for drawing conclusions related to the larger population.
The “x2” in the chi-square test for independence determines how likely the difference between actual and expected frequencies can be explained by random chance. This difference may or may not have been caused by a sampling error.
The p-value, calculated in a chi-square test, represents an area in the tail of a probability distribution curve. A p-value is a number between zero and one. It is expressed in decimals.
For example, a p-value of 0.0254 implies a 2.54% probability that the results could have happened by chance. The smaller the p-value, the more important (significant) the results are. A significant result is one where the null hypothesis is rejected.
The chi-square statistic to be used depends on how the data has been collected and which hypothesis is being tested.
The goodness of fit test is used when there is one categorical variable. It is also used when there is a need to determine the consistency between the sample data and the hypothesized distribution.
On the other hand, the chi-square test for independence is used when there is a need to study the relationship between two categorical variables.
- The chi-square test is a non-parametric test that compares two or more variables from randomly selected data.
- The chi-square goodness of fit test helps determine whether the sample data matches the population or not.
- The chi-square test for independence helps determine whether the variables are independent of one another or not.
- Two random variables are said to be independent if the probability distribution of one variable is not affected by the other.
- The symbol of the chi-square test is “x2” (i.e., “x” raised to the power 2).
- If the chi-square calculated value is greater than the chi-square critical value, the null hypothesis (H0) is rejected.
- If the chi-square p-value is less than or equal to the significance level (α), the null hypothesis is rejected.
This has been a guide to Chi-Square Test in Excel. Here we learn how to perform the chi-square test along with step by step examples. Below are some useful Excel articles-