Multiple Linear Regression
Last Updated :
21 Aug, 2024
Blog Author :
Wallstreetmojo Team
Edited by :
Ashish Kumar Srivastav
Reviewed by :
Dheeraj Vaidya
Table Of Contents
Multiple Linear Regression Definition
Multiple linear regression models are a type of regression model that deals with one dependent variable and several independent variables. Regression analysis is a statistical method or technique used for determining relationships between variables that have a cause-and-effect relationship. Regressions can also reveal how close and well one can determine a relationship.
Regressions are helpful to quantify the link or relationship between one variable and the other variables responsible for it. The findings are later used to make predictions of the components involved. Most empirical economic studies include a regression. They are also extensively used in sociology, statistics, and psychology.
Table of contents
- Multiple linear regression analysis is a statistical method or tool for discovering cause-and-effect correlations between variables. Regressions reflect how strong and stable a relationship is.
- The Multiple linear regression model is a simple linear regression model but with extensions. In linear regression, there is only one explanatory variable. Here, there are various explanatory variables.
- It helps in making predictions for the required information from the components involved.
- Its application includes finding the body fat percentage in adults. Finding factors that can influence education to help the government frame policies, etc.
Multiple Linear Regression Explained
Multiple linear regression models help establish the relationship between two or more independent variables and one dependent variable. This model is an extension of the simple linear regression model. There is only one explanatory variable in a basic linear regression. However, there are several explanatory variables in multiple linear regressions. Therefore, when there are two or more controlled variables in the connection, there is the application of Multiple linear regression. This is especially true in the following cases:
- To find the extent or degree to which two or more independent variables and one dependent variable are related (e.g., how rainfall, temperature, soil PH, and amount of fertilizer added affect the growth of the fruits).
- The dependent variable's value at a given value of the independent variables (e.g., the expected yield of the fruits at certain levels of rainfall, temperature, Soil PH, and fertilizer addition)
Multiple linear regression interpretation helps make predictions and acts as a guide to key decisions. For example, governments may use these inputs to frame welfare policies. In addition, various websites provide its calculators to check the values. Also, one can use software tools for the same such as SPSS.
Formula
Multiple linear regression models are frequently used as empirical models or for approximation functions. For example, while the exact functional relationship between Y and X (X1 X2…… Xn) values is unknown, the linear regression model provides an adequate approximation to the true unknown function for certain ranges of the regressor variables. While using online calculators and utilizing SPSS software is easy, knowing the derivation of values is essential.
One can use the following formula to calculate Multiple linear regression:
YI= β0+β1X1 β2X2 +…..+…+βkXk+ ε.
The above-given equation is simply an extension of Simple Linear Regression. Here, the output variable is Y, and the associated input variables are in X terms, with each predictor having its slope or regression coefficients (β). Also, the first term (β0) is the intercept constant, which is the value of Y. In this case, any value of all predictors is absent (i.e., when all X terms are 0). Both of their values are the same. K is the regressor or predictor variable. ε is to give room for the standard errors.
Example
Consider an example to get a better idea of Multiple linear regression.
Let's take the values of X1 as 0, 11, 11, values of X2 as 1, 5, 4, and Y values like 11, 15, and 13.
Here,
- Sum of X1 = 22
- Sum of X2 = 10
- Sum of Y = 39
- X1 = 7.3333
- X2 = 3.3333
- Mean Y = 13
Sum of squares:
- (SSX1) = 80.6667
- And, (SSX2) = 8.6667
Sum of products:
- (SPX1Y) = 22
- (SPX2Y) = 8
- And, (SPX1X2) = 25.6667
Regression Equation = ŷ = b1X1 + b2X2 + a
β 1 = ((SPX1Y)*(SSX2)-(SPX1X2)*(SPX2Y)) / ((SSX1)*(SSX2)-(SPX1X2)*(SPX1X2)) = -14.67/40.33 = -0.36364
β 2 = ((SPX2Y)*(SSX1)-(SPX1X2)*(SPX1Y)) / ((SSX1)*(SSX2)-(SPX1X2)*(SPX1X2)) = 80.67/40.33 = 2
a = MY - β 1MX1 - β 2MX2 = 13 - (-0.36*7.33) - (2*3.33) = 9
Therefore, ŷ = -0.36364X1 + 2X2 + 9
Assumptions
The calculation of Multiple linear regression requires several assumptions, and a few of them are as follows:
Linearity
One can model the linear (straight-line) relationship between Y and the X's using multiple regression. Any curvilinear relationship is not taken into account. This can be analyzed by scatter plots on the primary stages. At the same time, non-linear patterns may be found in the residual plots.
Constant variance
For all values of the X's, the variance of the ε is constant. To detect this, the residual plots of X's can be used. It is also easy to assume constant variance if the residual plots have a rectangular shape. In addition, non-constant variance exists and must be addressed if a residual plot reveals a changing wedge shape.
Special Occasions
The presumption is that the data is eliminated from all special clauses resulting from one-time events. Accordingly, the regression model may have non-constant variance, non-normality, or other issues if they don't.
Normality
When one uses hypothesis tests and confidence limits, the assumption is that there is a normal distribution of ε's.
Multi co-linearity
The presence of near-linear connections among the set of independent variables is co-linearity or multi-co-linearity. Here, since multi-co-linearity causes plenty of difficulties with regression analysis, the assumption is that the data isn't multi-co-linear.
Frequently Asked Questions (FAQs)
Multiple linear regression is seen as an extension of the simple linear regression where one or more independent variables are involved apart from one dependent variable.
Multiple linear regression has one or more x and y variables, one dependent variable, and more than one independent variable. In Regression Linear, there is only one x and y variable.
Analysts have a theoretical relationship in mind, and regression analysis confirms them. It aims to find an equation that summarizes the relationship between a data set. The analysis also helps in making fewer assumptions about the set of values.
The main goal of multiple linear regression interpretation is to anticipate a response variable. For example, it could be sales, delivery time, efficiency, car drive analysis, hospital occupancy rate, percentage of body mass of one gender, etc. These forecasts could be extremely useful for planning, monitoring, or analyzing a process or system.
Recommended Articles
This has been a Guide to Multiple Linear Regression and its Definition. Here we explain the formula, assumption, and their explanations along with examples. You can learn more from the following articles -