What is Regression?
Regression Analysis is a statistics-based measurement used in finance, investing, etc., which aims to set up a relationship between a dependent variable and other series of independent variables, and the prime focus is determining the strength of the above relationship.
- To explain regression analysis in a layman’s term, let us assume a sales head of a company is trying hard to forecast the sales of the following month. There are numerous factors involved which are driving the sales of the product, starting from the weather to the competitor’s new strategy, festival, and change in the lifestyle of consumers.
- This is a method of aligning out of the several factors which affect the sale, which are the ones that have the major impact. It can help in answering many questions like what are the most important factors, what factors are less important, what is the relationship between these factors, and most importantly, what is the surety of these factors.
- These factors are called variables. The main factor that we are trying to forecast is called the dependent variable, and the other factors which have an impact on the dependent variable are called the independent variablesIndependent VariablesIndependent variable is an object or a time period or a input value, changes to which are used to assess the impact on an output value (i.e. the end objective) that is measured in mathematical or statistical or financial modeling..
Simple linear regression analysis in excel can be expressed as the formula below, and it measures the relationship between a dependent variable and one independent variable.
Y = a + bX + ϵ
- Y – Dependent variable
- X – Independent (explanatory) variable
- a – Intercept
- b – Slope
- ϵ – Residual (error)
How to Interpret Regression Analysis?
This can be interpreted by assuming a simple scenario. Here we are taking the relationship between the prices of antique collection for auction and the duration of its age. The more an antique gets older, the more the price it fetches. Assuming that we have set data for the last 50 items which had been auctioned, we can predict what will be the future auction prices be based on the age of the item. Using this data, we can build a regression equation.
The regression formula which can set up a relation between age and price is as follows:
y = β0 + β1 x + error
- Here the dependent factor is Y. Y represents the price of each item to be auctioned, whereas the independent factor is X, which determines the age.
- Parameters β0 and β1 are parameters that are not known and will be estimated by the equation.
- β0 is a constant that is used to define the linear trend line intercepts the Y-axis.
- β1 is a constant that demonstrates the magnitude of change in the value of the dependent variable as a related function of the change implied to the independent variables.
- This is basically called the slope of the equation. When the slope is a liner, it means there is a proportionate relationship between age and price, and where the slope is inverse, it means the relationship is indirectly proportional.
- The error can be defined as the noise or variation in the target variable and is random in nature.
Real-life Examples of Regression Analysis
Let us assume we need to establish a relationship between the sales which has happened and the amount spent on advertising related to a product.
We can generally observe a positive relationship between the sales quantity and the amount spent on advertising. Allying simple linear regression equation, we have got:
Y = a + bX
Suppose we get the value as
Y= 500 +30X
The predicted slope of 30 helps us draw a conclusion that the average sales increase $30 per year as the spend on advertisement increases.
Types of Regression Analysis
#1 – Linear
This can be expressed as the formula below, and it measures the relationship between a dependent variable and one independent variable.
#2 – Polynomial
In this method, the analysis is used to measure the relationship between single dependent factors and multiple independent variables.
#3 – Logistic
Here the dependent factor or variable is binary in nature. The independent variables can be continuous or binary. In multinomial logistic regression, we can afford to have more than two categories while choosing our independent variable.
#4 – Quantile
This is an additive concept of linear regression and is primarily used when outliers and skewness are present in the data.
#5 – Elastic Net
This is useful when one is handling very high correlated independent variables.
#6 – Principal Components Regression (PCR)
This is a technique which is applicable when there are too many independent variables or multicollinearity exist in the data
#7 – Partial Least Squares (PLS)
It is an opposite method of the principal component where we have independent variables highly correlated. It is also applicable when there are many independent variables.
#8 – Support Vector
This can provide a solution to linear and non-linear models. It makes use of non-linear kernel functions to find the optimal solution for non-linear models.
#9 – Ordinal
It is applicable to the prediction of ranked values. Basically, it is suitable when the dependent variable is ordinal in nature
#10 – Poisson
This is applicable when the dependent variable has count data.
#11 – Negative Binomial
It is also applicable to manage count data only that negative binomial regression does not assume distribution of count having variance equal to its mean, whereas Poisson regression assumes the variance equal to its mean.
#12 – Quasi Poisson
It is a substitute for negative binomial regression. It is also applicable to dispersed count data. The variance of a quasi-Poisson model is a linear function of the mean, while the variance of a negative binomial model is a quadratic function of the mean.
#13 – Cox
It comes more into use for analyzing time-to-event data.
Difference Between Regression and Correlation
- Regression establishes the relationship between an independent variance and a dependent variable where both the variables are different, whereas correlation determines the association or dependency of two variables where there is no difference between both the variables.
- The main objective of regression is to create a line of best fit and estimation of one variable is done on the basis of others, whereas in correlation demonstrates the linear relationship between two variables.
- In this, we estimate the magnitude of a certain change in the recognized variable (X) on the estimated variable (Y), whereas, in correlation, the coefficient is used to measure to what extent the two variables are moving together.
- It is a process of estimating the magnitude of random independent variables based on the magnitude of a static dependent variable, whereas correlation helps us to decide a particular value to express the interdependency between both the variables.
- Regression analysis primarily uses data in order to establish a relationship between two or more variables. Here it is assumed that relationships existing in the past will also be reflecting in the present or future. Few consider this as a time lag between past and present/future.
- However, it is a widely used forecasting and estimating technique. Although it involves mathematics, which many users may find tough, the technique is comparatively easy to be used, especially when a model is available.
This has been a guide to What is Regression & its Meaning. Here we discuss the difference between regression analysis and correlation along with types and examples. You can learn more about from the following articles –