## Chi-Square Test in Excel

The chi-square test is a non-parametric test that compares two or more variables from randomly selected data. It helps find the relationship between two or more variables. In Excel, we calculate the chi-square p-value. Since Excel does not have an inbuilt function, mathematical formulas are used to perform the chi-square test.

You are free to use this image on your website, templates, etc, Please provide us with an attribution linkHow to Provide Attribution?Article Link to be Hyperlinked

For eg:

Source: Chi-Square Test in Excel (wallstreetmojo.com)

##### Table of contents

### Key Takeaways

- The chi-square test is a non-parametric test that compares two or more variables from randomly selected data.
- The chi-square goodness of fit test helps determine whether the sample data matches the population or not.
- The chi-square test for independence helps determine whether the variables are independent of one another or not.
- Two random variables are said to be independent if the probability distribution of one variable is not affected by the other.
- The symbol of the chi-square test is “x
^{2}” (i.e., “x” raised to the power 2). - If the chi-square calculated value is greater than the chi-square critical value, the null hypothesis (H
_{0}) is rejected. - If the chi-square p-value is less than or equal to the significance level (α), the null hypothesis is rejected.

There are two types of chi-square tests which are listed as follows:

- Chi-square goodness of fit test
- Chi-square test for independence

### #1 – Chi-Square Goodness of Fit Test

The goodness of fit test helps determine whether the sample data matches the population or not. In other words, it shows how well the sample data fits a set of observations.

The symbol of the chi-square test is “x^{2}” (i.e., “x” raised to the power 2). “x^{2}” is the summation of the (observed count–expected count)^{2}/expected count.

The formula of the chi-square goodness of fit test is given as follows:

Where,

- “x
^{2}” is the chi-square statistic - “O
_{i}” is the observed frequency - “E
_{i}” is the expected frequency - “i” is the “i
^{th}” position in the contingency table - “k” is the category
- Degrees of freedom (df)=k-1

**Excel VBA – All in One Courses Bundle (35+ Hours of Video Tuto**rials)

**–>>** **If you want to learn Excel and VBA professionally**, then Excel VBA All in One Courses Bundle (35+ hours) is the perfect solution. Whether you’re a beginner or an experienced user, this bundle covers it all – from **Basic Excel to Advanced Excel, Macros, Power Query, and VBA.**

#### The Uses of the Goodness of Fit Test

The test is used in the following situations:

- To evaluate the creditworthinessCreditworthinessCreditworthiness is a measure of judging the loan repayment history of borrowers to ascertain their worth as a debtor who should be extended a future credit or not. For instance, a defaulter’s creditworthiness is not very promising, so the lenders may avoid such a debtor out of the fear of losing their money. Creditworthiness applies to people, sovereign states, securities, and other entities whereby the creditors will analyze your creditworthiness before getting a new loan.read more of borrowers based on their age group and history of debt obligations
- To establish a relation between the performance of sales representatives and the training received by them
- To compare the returns of a single stock with the returns on stocks of the whole sector
- To evaluate the impact of a television campaign on a category of viewers

### #2 – **Chi-Square Test for Independence**

It helps determine whether the variables are independent of one another or not. Two random variables are called independent if the probability distributionProbability DistributionProbability distribution could be defined as the table or equations showing respective probabilities of different possible outcomes of a defined event or scenario. In simple words, its calculation shows the possible outcome of an event with the relative possibility of occurrence or non-occurrence as required.read more of one variable is not affected by the other.

The formula of the chi-square test for independence is given as follows:

Where,

- “x
^{2}” is the chi-square statistic - “O
_{ij}” is the observed frequency in the i^{th}row and j^{th}column - “E
_{ij}”is the expected frequency in the i^{th}row and j^{th}column - “r” is the number of rows
- “c” is the number of columns
- Degrees of freedom (df)=(r-1)(c-1)

The formula for calculating the expected frequency in the i^{th} row and j^{th} column is given as follows:

#### The Uses of the Chi-Square Test for Independence

The test is used in the following situations:

- There are two categorical variables and the relation between them is to be determined.
- There are cross-tabulations (two-way tables) and the relationship between various categorical variables needs to be studied.
- There are non-quantifiable variables. For example, one needs to determine the reason behind variations of health plans across age groups.

### The Characteristics of the Chi-Square Test

The features of the chi-square test are listed as follows:

- It evaluates whether observed frequencies vary significantly from the expected frequencies under a given set of assumptions.
- It determines how well an assumed distribution fits the data.
- It uses the contingency tables (or crosstabs) to summarize the relationship between various categorical variables.
- It supports nominal level measurements.

**Note:** In the simplest form, the chi-square distribution is the square of the standard normal distribution.

### How to Perform the Chi-Square Test in Excel? (With Example)

A restaurant manager wants to find the relationship between quality of service and the salary of customers waiting to be served.

She organizes the task in the following way:

- A random sample of 100 customers is considered.
- Every customer is asked to rate the service of the restaurant as “excellent,” “good,” and “poor.”

She constructs the following hypothesis:

- Null hypothesisNull HypothesisNull hypothesis presumes that the sampled data and the population data have no difference or in simple words, it presumes that the claim made by the person on the data or population is the absolute truth and is always right. So, even if a sample is taken from the population, the result received from the study of the sample will come the same as the assumption.read more (H
_{0})–The quality of service is not dependent on the salary of customers waiting to be served. - Alternative hypothesis (H
_{1})–The quality of service is dependent on the salary of customers waiting to be served.

The manager divides the customers into three categories based on their salaries–“low,” “medium,” and “high.” The level of significance (α) is 0.05.

The findings are presented as nine data points shown in the following table.

Let us calculate the sum of all the rows and columns. We apply the following SUM formula to add the numbers of the fourth row.

“=SUM(B4:D4)”

Press the “Enter” key and the sum appears in cell E4. The output is 26.

Similarly, we apply the SUM formulaSUM FormulaThe SUM function in excel adds the numerical values in a range of cells. Being categorized under the Math and Trigonometry function, it is entered by typing “=SUM” followed by the values to be summed. The values supplied to the function can be numbers, cell references or ranges.read more to the remaining rows and columnsRows And ColumnsA cell is the intersection of rows and columns. Rows and columns make the software that is called excel. The area of excel worksheet is divided into rows and columns and at any point in time, if we want to refer a particular location of this area, we need to refer a cell.read more. There are 27 respondents with medium salary and 51 respondents who rated the service quality as “good.”

We apply the formula “(r-1)(c-1)” to calculate the degrees of freedom (df).

df=(3-1)(3-1)=2*2=4

We apply the following formula to calculate the expected frequency for column B and row 4.

“(=B7*E4/B9)”

The calculation is shown in the following image.

The expected number of customers who have “low” salary but rated the restaurant service as “excellent” is 8.32.

In the following calculations, E_{11 }is the expected frequency of the first row and the first column. E_{12 }is the expected frequency of the first row and the second column.

- E
_{11}=(26*32)/100=8.32, E_{12}=7.02, E_{13}=10.66 - E
_{21}=16.32, E_{22}=13.77, E_{23}=20.91 - E
_{31}=7.36, E_{32}=6.21, E_{33}=9.43

Similarly, we calculate the expected frequencies for the entire table, as shown in the succeeding image.

Let us calculate the chi-square data points by using the following formula.

Chi-square points=(observed-expected)^2/expected

We apply the formula “=(B4-B14)^2/B14” to calculate the first chi-square point.

We copy and paste the formula to the remaining cells. This is done to fill values in the entire table, as shown in the following image.

Let us calculate the chi-square calculated value by adding all the values given in the succeeding table.

The chi-square calculated value is 18.65823.

To calculate the critical value, we use either the chi-square critical value table or the CHISQ formula. The formula “CHISQ.INV.RT” contains two parameters–the probability and the degrees of freedomDegrees Of FreedomDegrees of freedom (df) refers to the number of independent values (variable) in a data sample used to find the missing piece of information (fixed) without violating any constraints imposed in a dynamic system. These nominal values have the freedom to vary, making it easier for users to find the unknown or missing value in a dataset.read more.

The probability is 0.05, which is a significant value. The df is equal to 4.

The chi-square critical value is 9.487729037.

Let us find the chi-square p-value with the help of the following formula.

“=CHITEST(actual_range,expected_range)”

We apply the formula “=CHITEST(B4:D6,B14:D16).”

The chi-square p-value is= 0.00091723.

The chi-square calculated value is significant when equal to or more than the chi-square critical value (tabulated value). The null hypothesis (H_{0}) is rejected if the chi-square calculated value is greater than the chi-square critical value.

Here x^{2} (calculated)>x^{2} (tabulated) or 18.65>9.48. Hence, we reject the null hypothesis and accept the alternative hypothesis.

The p-valueP-valueP-Value, or Probability Value, is the deciding factor on the null hypothesis for the probability of an assumed result to be true, being accepted or rejected, & acceptance of an alternative result in case of the assumed results rejection. read more can also determine whether the null hypothesis must be accepted or rejected. For this, the p-value is compared with alpha (α) in the following way:

- If p-value <= α, the null hypothesis is rejected.
- If p-value > α, the null hypothesis is accepted.

In this example, p-value<α or 0.0009172<0.05. So, we reject H_{0} and accept H_{1}.

We conclude that the quality of service is dependent on the salary of customers waiting to be served.

### Frequently Asked Questions

**How should the chi-square test be interpreted?**

The “x^{2}” in the goodness of fit test determines how well the sample data matches the characteristics of the larger population. If the sample data does not match the expected properties of the population, this sample is not used for drawing conclusions related to the larger population.

The “x^{2}” in the chi-square test for independence determines how likely the difference between actual and expected frequencies can be explained by random chance. This difference may or may not have been caused by a sampling error.

**What is the p-value in a chi-square test?**

The p-value, calculated in a chi-square test, represents an area in the tail of a probability distribution curve. A p-value is a number between zero and one. It is expressed in decimals.

For example, a p-value of 0.0254 implies a 2.54% probability that the results could have happened by chance. The smaller the p-value, the more important (significant) the results are. A significant result is one where the null hypothesis is rejected.

**Which chi-square test should be used in Excel?**

The chi-square statistic to be used depends on how the data has been collected and which hypothesis is being tested.

The goodness of fit test is used when there is one categorical variable. It is also used when there is a need to determine the consistency between the sample data and the hypothesized distribution.

On the other hand, the chi-square test for independence is used when there is a need to study the relationship between two categorical variables.

### Recommended Articles

This has been a guide to Chi-Square Test in Excel. Here we learn how to perform the chi-square test along with step by step examples. Below are some useful Excel articles-