What is Linear Regression?
Linear regression is a predictive analysis algorithm. It is a statistical method that determines the correlation between dependent and independent variables. This type of distribution forms a line and hence called a linear regression. It is one of the most common types of predictive analysis.
It is used to predict the dependent variable’s value when the independent variable is known. A regression graph is a scatterplot that depicts the arrangement of a dataset; x and y are the variables. The nearest data points that represent a linear slope form the regression line. Thus, plotting and analyzing a regression line on a regression graph is called linear regression.
Table of contents
- Linear regression is a statistical model that is used for determining the intensity of the relationship between two or more variables. One dependent variable relies on changes occurring in independent variables.
- They are classified into two subtypes—simple and multiple regression.
- Regression validity depends on assumptions like linearity, homoscedasticity, normality, multicollinearity, and independence.
- In the formula, ‘Y’ is the dependent or outcome variable, ‘a’ is the y-intercept, ‘b’ is the regression line’s slope, and ‘ɛ’ is the error term:
Y = a + bX + ε
Linear Regression Explained
Linear regression is a model that defines a relationship between a dependent variableDependent VariableA dependent variable is one whose value varies in response to the change in the value of an independent variable. ‘y’ and an independent variable ‘x.’ This phenomenon is widely applied in machine learning and statistics.
It is applied to scenarios where the variation in the value of one particular variable significantly relies on the change in the value of a second variable. Here, the dependent variable is also called the output variable. The dependent variable varies depending on the change in the independent variable.
It is classified into two types:
- Simple Linear Regression: It is a regression model that represents a correlation in the form of an equation. Here the dependent variable, y, is a function of the independent variable, x. It is denoted as Y = a + bX + ε, where ‘a’ is the y-intercept, b is the slope of the regression line, and ε is the error.
- Multiple Linear Regression: It is a form of regression analysis, where the change in the dependent variable depends upon the variation in two or more correlated independent variables.
In practical scenarios, it is not always possible to attribute the change in an event, object, factor, or variable to a single independent variable. Rather, changes to the dependent variable result from the impact of various factors—linked to each other in some way. Thus, multiple regressionMultiple RegressionMultiple regression formula is used in the analysis of the relationship between dependent and numerous independent variables. Formula = y = mx1 + mx2+ mx3+ b analysis plays a crucial role in real-world applications.
Before choosing, researchers need to check the dependent and independent variables. This model is suitable only if the relationship between variables is linear. Sometimes it is not the best fit for real-world problems. For example, age and wages do not have a linear relation. Most of the time, wages increase with age. However, after retirement, age increases but wages decrease.
Linear Regression Formula
A dependent variable is said to be a function of the independent variable; represented by the following linear regression equation:
Here, ‘Y’ is the dependent or outcome variable;
- ‘a’ is the y-intercept;
- ‘b’ is the slope of the regression line;
- ‘X’ is the independent or exogenous variable; and
- ‘ɛ’ is the error term; if any
Note – The above formula is used for computing simple linear regression.
Linear regression is computed in three steps when the values of x and y variables are known:
- First, determine the values of formula components a and b, i.e., Σx, Σy, Σxy, and Σx2. Then, make a chart tabulating the values of x, y, xy, and x2.
- Then the values derived in the above chart are substituted into the following formula:
a= , and b=
- Finally, place the values of a and b in the formula Y = a + bX + ɛ to figure out the linear relationship between x and y variables.
To better understand calculations, take a look at the Linear regression ExamplesLinear Regression ExamplesLinear regression represents the relationship between one dependent variable and one or more independent variable. Examples of linear regression are relationship between monthly sales and expenditure, IQ level and test score, monthly temperatures and AC sales, population and mobile sales.
The analyst needs to consider the following assumptions before applying the linear regression model to any problem:
- Linearity: There should be a linear pattern of relationship between the dependent and the independent variables. It can be depicted with the help of a scatterplot for x and y variables.
- Homoscedasticity: The variance or residual between the dependent and independent variables should also be equal throughout the regression line— irrespective of x and y values. The analysts can make a fitted value Vs. residual plot to test this assumption.
- Normality: The normal distribution of x and y values is crucial. The residuals should be multivariate normal, and it can be determined by creating a Q-Q plot or histogram.
- Independence: In such an analysis, the observations should have no auto-correlation. To provide fair results, consecutive residuals should be independent of each another. To validate this assumption, analysts use the Durbin Watson test.
- No Multicollinearity: Excessive correlation between independent variables can mislead the analysis. Therefore, data shouldn’t be multicollinear. To avoid this issue, variables with high variance inflation (one variable significantly influences another) should be eliminated.
Frequently Asked Questions (FAQs)
It aims to determine the value of a particular variable (dependent variable) with the help of a known independent variable. Moreover, it analyzes the strength of the relationship between two variables by plotting it on a regression graph.
It determines the closest points of a data set that represent a linear pattern. it is applied to scenarios where the variation in the value of one particular variable significantly relies on the change in the value of a second variable.
R squared or R2 is an indicator of the degree to which a dependent variable deviates from the independent variable. A regression model is considered valid when R2 is more than 0.95.
It identifies a linear pattern of relationship between data points—when plotted on a regression graph. For this purpose, analysts use different models—simple, multiple, and multivariate regression.
This article has been a guide to Linear Regression & Definition. We learn simple & multiple linear regression models, along with formulas, calculations, & assumptions. You can learn more about accounting from the following articles –
- Multiple Linear RegressionMultiple Linear RegressionMultiple linear regression models are a type of regression model that deals with one dependent variable and several independent variables.
- Nonlinear RegressionNonlinear RegressionNonlinear regression refers to a regression analysis where the regression model portrays a nonlinear relationship between a dependent variable and independent variables.
- Least Squares RegressionLeast Squares RegressionVBA square root is an excel math/trig function that returns the entered number's square root. The terminology used for this square root function is SQRT. For instance, the user can determine the square root of 70 as 8.366602 using this VBA function.