Goodman And Kruskal's Gamma

Publication Date :

12 Jul, 2023

Blog Author :

Edited by :

Reviewed by :

Table Of Contents

What Is Goodman And Kruskal's Gamma?

Goodman and Kruskal's gamma (gamma statistics or gamma coefficient) is a non-parametric statistical measure identifying the correlation between two ordinal variables. It determines the direction and strength of the relationship between these data pairs to predict the future trend or values.

The gamma coefficient applies to ordinal data sets with multiple tied ranks or orders with continuous or discrete variables. The gamma value ranges between -1 and 1, where -1 indicates a perfectly negative association between the data pairs, 1 represents a perfectly positive correlation, and 0 resembles no relationship.

Key Takeaways

Goodman and Kruskal's gamma, gamma statistics, or gamma coefficient, is a statistical measure that gauges the strength and direction of association between two ordinal variables.
The formula used to determine the gamma coefficient value is: γ=Nc-NdNc+Nd; where Nc represents the number of concordant pairs, and Nd denotes the number of discordant pairs.
The value of Goodman-Kruskal gamma is between -1 to 1. While -1 indicates a strongly negative association between the variables, 1 signifies a positive relationship. However, 0 resembles that there is no connection between the variables.

Goodman And Kruskal's Gamma Explained

Goodman and Kruskal's gamma is a statistical measure utilized to evaluate the strength and direction of association between two ordinal variables. It was introduced between 1954 to 1972 in a series of papers written by Leo Goodman and William Kruskal as a crucial gauge that enables researchers to quantify the relationship between the variables measured on an ordinal scale. Moreover, by considering tied ranks, gamma provides a more accurate assessment of the association between variables. It extends Spearman's rank correlation coefficient and is particularly useful when working with non-parametric data.

Gamma statistics are widely utilized in business, social sciences, epidemiology, and market research, where researchers need to analyze the relationships between variables. Further, Goodman and Kruskal's gamma results can be effortlessly interpreted and communicated. Hence, it aids researchers in understanding patterns and relationships within their data, facilitating informed decision-making and meaningful conclusions. For instance, the gamma coefficient can be employed in business to assess the relationship between customer satisfaction and loyalty.

However, gamma statistics is designed explicitly for ordinal variables and may not be suitable for analyzing other types of data like the one measured on a nominal scale. Also, a large number of tied ranks can impact the accuracy of the measure and result in less reliable outcomes. Such an analysis fails to specify the nature of the relationship between variables, I.e., linear and nonlinear associations. Moreover, it cannot handle multiple independent variables since it is a bivariate measure. Even a small size may provide biased and inaccurate results limiting the statistical power of this measure.

Assumptions

There are two fundamental assumptions for the application of the given data. However, if a data set doesn't fulfill any of the following assumptions, then an alternative statistical measure should be used for analysis:

The paired data sets should comprise ordinal variables. Ordinal variables possess categories or levels with natural order but lack specific numerical values. The examples include education level (high school, college, graduate) or Likert scale responses (strongly agree, agree, neutral, disagree, strongly disagree).
The paired variables should exhibit a monotonic connection whereby a rise in one variable results in an apparent increase or decrease in the rank of another variable.

How To Calculate?

Let us split Goodman and Kruskal's gamma calculation into the following two categories: Calculation and interpretation.

Calculation

Although there are various Goodman and Kruskal gamma calculators available online, one can use the following steps to find the value of the gamma coefficient:

Step 1 - Create a contingency table

Construct a contingency table that presents the frequencies of the joint distribution of the two ordinal variables. The rows of the table represent the levels of one variable, while the columns represent the levels of the other variable.

Step 2 - Determine the number of concordant pairs (N_c)

Concordant pairs are observations exhibiting similar ordering or rank for both variables. Find the number of concordant pairs in the contingency table.

Step 3 - Find the number of discordant pairs (N_d)

Discordant pairs are observations with different orderings or ranks for the two variables. Ascertain the number of discordant pairs in the contingency table.

Step 4 - Evaluate the total number of pairs (N_c + N_d)

Compute the number of pairs by adding the concordant and discordant pairs.

Step 5 - Calculate Goodman and Kruskal's gamma

Compute gamma by subtracting the number of discordant pairs from the number of concordant pairs and dividing it by the total number of pairs. The gamma coefficient formula is mathematically denoted as follows:

γ=Nc-NdNc+Nd

Where:

N_c denotes the number of concordant pairs; and
N_d represents the number of discordant pairs.

Interpretation

The value of the gamma coefficient is between -1 to +1. The closer the value to the extreme ends, i.e., -1 or 1, the stronger the relationship between the data pairs. It resembles the following:

-1 indicates a perfect negative association;
1 represents a perfect positive association; and
0 signifies no association between the variables.

Further, statistical tests such as the P value can be employed to determine whether the computed gamma value significantly differs from zero.

Examples

Let us consider the following examples to understand the concept better:

Example #1

The gamma coefficient can be used to determine the association between students' nervousness in tests and their performance. Thus, by evaluating the gamma value for the level of fear and the test results, i.e., pass or fail, the researcher can identify the strength (very weak, weak, negligible, strong, very strong) and direction (i.e., negative or positive) of the association between these two variables.

Example #2

A business analyst wants to determine the association between the consumer's income class and demand for luxury cars. If assuming that the analyst considers two income groups - upper and middle, and demand variables as high and low, we can use the special variation of the Goodman-Kruskal gamma, i.e., Yule's Q to find out the gamma value when the contingency table is as follows:

	Income Class
Demand	Upper	Middle
High	89	2
Low	11	98

Finding N_c and N_d:

N_c = 89 * 98 = 8722
N_d = 2 * 11 = 22

γ=Nc-NdNc+Nd

γ = (8722 - 22) / (8722 + 22) = 0.99

The value of the gamma coefficient is very close to 1, i.e., 0.99. Hence, a strong positive association exists between income class and demand for luxury cars.