Multiple Linear Regression

Last Updated :

21 Aug, 2024

Blog Author :

Wallstreetmojo Team

Edited by :

Ashish Kumar Srivastav

Reviewed by :

Dheeraj Vaidya

Table Of Contents

arrow

    Multiple Linear Regression Definition

    Multiple linear regression models are a type of regression model that deals with one dependent variable and several independent variables. Regression analysis is a statistical method or technique used for determining relationships between variables that have a cause-and-effect relationship. Regressions can also reveal how close and well one can determine a relationship.

    Regressions are helpful to quantify the link or relationship between one variable and the other variables responsible for it. The findings are later used to make predictions of the components involved. Most empirical economic studies include a regression. They are also extensively used in sociology, statistics, and psychology.

    • Multiple linear regression analysis is a statistical method or tool for discovering cause-and-effect correlations between variables. Regressions reflect how strong and stable a relationship is.
    • The Multiple linear regression model is a simple linear regression model but with extensions. In linear regression, there is only one explanatory variable. Here, there are various explanatory variables.
    • It helps in making predictions for the required information from the components involved.
    • Its application includes finding the body fat percentage in adults. Finding factors that can influence education to help the government frame policies, etc.

    Multiple Linear Regression Explained

    Multiple Linear Regression

    Multiple linear regression models help establish the relationship between two or more independent variables and one dependent variable. This model is an extension of the simple linear regression model. There is only one explanatory variable in a basic linear regression. However, there are several explanatory variables in multiple linear regressions. Therefore, when there are two or more controlled variables in the connection, there is the application of Multiple linear regression. This is especially true in the following cases:

    • To find the extent or degree to which two or more independent variables and one dependent variable are related (e.g., how rainfall, temperature, soil PH, and amount of fertilizer added affect the growth of the fruits).
    • The dependent variable's value at a given value of the independent variables (e.g., the expected yield of the fruits at certain levels of rainfall, temperature, Soil PH, and fertilizer addition)

    Multiple linear regression interpretation helps make predictions and acts as a guide to key decisions. For example, governments may use these inputs to frame welfare policies. In addition, various websites provide its calculators to check the values. Also, one can use software tools for the same such as SPSS.

    Formula

    Multiple linear regression models are frequently used as empirical models or for approximation functions. For example, while the exact functional relationship between Y and X (X1 X2…… Xn) values is unknown, the linear regression model provides an adequate approximation to the true unknown function for certain ranges of the regressor variables. While using online calculators and utilizing SPSS software is easy, knowing the derivation of values is essential.

    One can use the following formula to calculate Multiple linear regression:

    YI= β0+β1X1 β2X2 +…..+…+βkXk+ ε.

    The above-given equation is simply an extension of Simple Linear Regression. Here, the output variable is Y, and the associated input variables are in X terms, with each predictor having its slope or regression coefficients (β). Also, the first term (β0) is the intercept constant, which is the value of Y. In this case, any value of all predictors is absent (i.e., when all X terms are 0). Both of their values are the same. K is the regressor or predictor variable. ε is to give room for the standard errors.

    Example

    Consider an example to get a better idea of Multiple linear regression.

    Let's take the values of X1 as 0, 11, 11, values of X2 as 1, 5, 4, and Y values like 11, 15, and 13.

    Here,

    • Sum of X1 = 22
    • Sum of X2 = 10
    • Sum of Y = 39
    • X1 = 7.3333
    • X2 = 3.3333
    • Mean Y = 13

    Sum of squares:

    • (SSX1) = 80.6667
    • And, (SSX2) = 8.6667

    Sum of products:

    • (SPX1Y) = 22
    • (SPX2Y) = 8
    • And, (SPX1X2) = 25.6667

    Regression Equation = ŷ = b1X1 + b2X2 + a

    β 1 = ((SPX1Y)*(SSX2)-(SPX1X2)*(SPX2Y)) / ((SSX1)*(SSX2)-(SPX1X2)*(SPX1X2)) = -14.67/40.33 = -0.36364

    β 2 = ((SPX2Y)*(SSX1)-(SPX1X2)*(SPX1Y)) / ((SSX1)*(SSX2)-(SPX1X2)*(SPX1X2)) = 80.67/40.33 = 2

    a = MY - β 1MX1 - β 2MX2 = 13 - (-0.36*7.33) - (2*3.33) = 9

    Therefore, ŷ = -0.36364X1 + 2X2 + 9

    Assumptions

    The calculation of Multiple linear regression requires several assumptions, and a few of them are as follows:

    Linearity

    One can model the linear (straight-line) relationship between Y and the X's using multiple regression. Any curvilinear relationship is not taken into account. This can be analyzed by scatter plots on the primary stages. At the same time, non-linear patterns may be found in the residual plots.

    Constant variance

    For all values of the X's, the variance of the ε is constant. To detect this, the residual plots of X's can be used. It is also easy to assume constant variance if the residual plots have a rectangular shape. In addition, non-constant variance exists and must be addressed if a residual plot reveals a changing wedge shape.

    Special Occasions

    The presumption is that the data is eliminated from all special clauses resulting from one-time events. Accordingly, the regression model may have non-constant variance, non-normality, or other issues if they don't.

    Normality

    When one uses hypothesis tests and confidence limits, the assumption is that there is a normal distribution of ε's.

    Multi co-linearity

    The presence of near-linear connections among the set of independent variables is co-linearity or multi-co-linearity. Here, since multi-co-linearity causes plenty of difficulties with regression analysis, the assumption is that the data isn't multi-co-linear.

    Frequently Asked Questions (FAQs)

    What is multiple linear regression?

    Multiple linear regression is seen as an extension of the simple linear regression where one or more independent variables are involved apart from one dependent variable.

    What is the difference between linear and multiple regression?

    Multiple linear regression has one or more x and y variables, one dependent variable, and more than one independent variable. In Regression Linear, there is only one x and y variable.

    What are the advantages of multiple regression?

    Analysts have a theoretical relationship in mind, and regression analysis confirms them. It aims to find an equation that summarizes the relationship between a data set. The analysis also helps in making fewer assumptions about the set of values.

    Why is multiple linear regression important?

    The main goal of multiple linear regression interpretation is to anticipate a response variable. For example, it could be sales, delivery time, efficiency, car drive analysis, hospital occupancy rate, percentage of body mass of one gender, etc. These forecasts could be extremely useful for planning, monitoring, or analyzing a process or system.

    This has been a Guide to Multiple Linear Regression and its Definition. Here we explain the formula, assumption, and their explanations along with examples. You can learn more from the following articles -