FLASH SALE! - "FINANCIAL MODELING COURSE BUNDLE AT 60% OFF" Enroll Now

# Ridge Regression

Updated on January 6, 2024
Article byShrestha Ghosal
Edited byAshish Kumar Srivastav
Reviewed byDheeraj Vaidya, CFA, FRM

## What Is Ridge Regression?

The ridge regression is a type of linear regression model that aids in analyzing multicollinearity in multiple regression data. It aims to reduce the sum of squared errors between the actual and predicted values and actual values by adding a penalty term that diminishes the coefficients and brings them closer to zero.

For eg:
Source: Ridge Regression (wallstreetmojo.com)

The ridge regression formula includes a regularization term to prevent overfitting. Multicollinearity is when the data set contains over two predicted variables with high correlations. Ridge regression in machine learning helps to analyze a data set where the predictor variables number is more than the number of observations.

### Key Takeaways

• Ridge regression is an instrument in machine learning that helps to analyze multiple regression data sets with multicollinearity. It aims to reduce the standard errors by adding a penalty term to the regression estimates.
• It is valuable for analyzing data sets comprising a more significant number of predictors than the number of observations. It restrains over fittings in the model and decreases its complexity.
• However, this regression is biased, and the L2 regularization decreases the regression coefficient values and brings them toward zero.

### Ridge Regression Explained

Ridge regression is a linear regression type with the objective of analyzing multicollinearity in multiple regression data. The least-square approximations tend to be unbiased when multicollinearity exists in a data set. However, the variances are enormous, and significant differences between the actual and predicted values may exist.

Ridge regression in machine learning decreases the standard error by adding a penalty term to the regression approximations. It aids in getting more accurate estimates. This regression performs L2 regularization by penalizing the weights of the feature’s coefficients and decreasing the value between the actual and predicted observations. Furthermore, it prevents overfitting and reduces the model’s complexity. It is especially beneficial when the data set contains a more significant number of predictors than the number of observations.

###### Financial Modeling & Valuation Courses Bundle (25+ Hours Video Series)

–>> If you want to learn Financial Modeling & Valuation professionally , then do check this ​Financial Modeling & Valuation Course Bundle​ (25+ hours of video tutorials with step by step McDonald’s Financial Model). Unlock the art of financial modeling and valuation with a comprehensive course covering McDonald’s forecast methodologies, advanced valuation techniques, and financial statements.

### Formula

The ridge regression formula is:

min RSS + α * ||β||2

Where RSS = residual sum of squares which is the sum of squared differences between the predicted and actual values

β = weights of the coefficients of the independent variables

α = a regularization parameter that controls the strength of the penalty term

### Examples

Let us understand this concept with the following examples:

#### Example #1

Suppose Green was working on a project for predicting property prices depending on several different features, including the property’s location, size, and number of bedrooms. He had a data set of historical property prices and the features they came with. However, Green felt that some property features may have high correlations, which can result in overfitting. To resolve this concern, Green used a regression model for adding a penalty term that would reduce the coefficients of the correlated features toward zero. This is an example of ridge regression.

#### Example #2

Suppose Rose works for a telecom company. She got the task of analyzing the customers who stopped the services. She had a data set of customer information, including gender, age, customer service interactions, and usage patterns. Rose had to build a model predicting which customers would end the services. The data set contained a vast number of features, and some features were unrelated to the study. Rose used a regression model for adding a penalty term that would reduce the effects of the irrelevant features on the analysis. This is an example of ridge regression.

• In cases where there are fewer features than the observations, the L2 penalty in these models will continue to work to decrease overfitting. Since the penalty reduces some coefficient values close to zero, it reduces overfitting. Additionally, it decreases the model’s complications.
• Users can apply these models to data sets that comprise several correlated features. Generally, the correlated features are a drawback for regression models, but the L2 penalty’s application into a regression model decreases the negative effect of correlated features.
• Applying this method is especially beneficial in cases with more features than observations. However, this method usually causes difficulties for standard regression models.

• The ridge regression coefficients approximation that this regression models produce is biased. The L2 penalty, which is added up to this regression model, results in shrinking the regression coefficient values closer to zero. It implies that the coefficients that this model generates do not accurately indicate the extent of the relationship between a feature and the outcome variables. It only provides a diminished version of that magnitude.
• The ridge regression coefficients often come with biases. Additionally, estimating the standard errors for these regression coefficients is tough. As a result, constructing confidence intervals and performing statistical evaluation on the coefficients becomes difficult.
• This regression establishes another hyperparameter that requires to be tuned. This hyperparameter influences the L2 penalty’s magnitude, which the model uses.
• The drawbacks that generally impact the standard regression models also affect these regression models. Issues related to model assumptions, interactions, and outliers also pertain to this regression.

### Ridge Regression vs Lasso vs Linear Regression

The differences are as follows:

• Ridge Regression: This method penalizes the model for aggregating the weight’s squared value. As a result, the weights generally have smaller absolute values. Furthermore, they penalize the extreme values of the weights, which leads to a group of weights that are more uniformly distributed.
• Lasso: This method is an altered version of linear regression, and the model is penalized for the aggregate of the weight’s absolute value. As a result, the weight’s total value is usually reduced, and many values may even be zero.
• Linear Regression: This method is linear regression’s most basic state, and, in this method, the model is not penalized for the weights. It means that if the model senses that one specific feature is particularly crucial within the training stage, it may assign considerable weight to that particular feature. This method can lead to overfitting in small data sets, which means the model performs better on the training set than on the test set.

1. When to use ridge regression?

This regression is most apt if the data consists of more predictor variables than the number of observations. Furthermore, it is appropriate if multicollinearity exists in the data set. Finally, it is suitable when many vast parameters have almost the same value, which means that most predictors influence the reaction.

2. What happens when the alpha parameter in ridge regression increases?

The alpha parameter is the penalty expression that indicates the constraint or shrinkage amount which will be applied to the equation. The alpha parameter in the equation denotes the ridge regression lambda parameter’s value. Thus, a change in the alpha parameter value impacts the penalty term. When the alpha parameter value is high, the penalty term is more significant, reducing the coefficient’s magnitude.

3. What happens when you increase Lambda in ridge regression?

When the ridge regression lambda parameter’s value increases, the bias increases, and the variance decreases. Consequentially, the best-fit line’s slope reduces, and the line turns horizontal. When this parameter value increases, the model turns less reactive to the independent variables. With the increase in lambda parameter value, this regression fit’s flexibility decreases, which leads to increased bias and reduced variance.

This article has been a guide to what is Ridge Regression. We explain its formula, comparison with lasso and linear regression, examples, advantages & disadvantages. You may also find some useful articles here –