## What is a Linear Regression?

Linear regression is basically a statistical modeling technique which used to show the relationship between one dependent variable and one or more independent variable. It is one of the most common types of predictive analysis. This type of distribution forms in a line hence this is called linear regression. In this article, we will take the examples of Linear Regression Analysis in Excel.

To do linear regression analysis first, we need to add excel add-insExcel Add-insAn add-in is an extension that adds more features and options to the existing Microsoft Excel.read more by following steps.

Click on File – Options (This will open Excel Options Pop up for you).

Click on Add-ins – Select Excel Add-ins from Manage Drop Down in excelManage Drop Down In ExcelA drop-down list in excel is a pre-defined list of inputs that allows users to select an option.read more, then Click on Go.

This will open Add-ins Pop up. Select Analysis ToolPakAnalysis ToolPakExcel's data analysis toolpak can be used by users to perform data analysis and other important calculations. It can be manually enabled from the addins section of the files tab by clicking on manage addins, and then checking analysis toolpak.read more then click Ok.

Data Analysis Add-in will appear under Insert Tab.

Let us understand by the below Examples of linear regression analysis in excel.

### Linear Regression Analysis Examples

#### Example #1

Suppose we have monthly sales and spent on marketing for last year, and now we need to predict future sales on the basis of last year’s sales and marketing spent.

Month | Advertising | Sales |
---|---|---|

Jan | 40937 | 502729 |

Feb | 42376 | 507553 |

Mar | 43355 | 516885 |

Apr | 44126 | 528347 |

May | 45060 | 537298 |

Jun | 49546 | 544066 |

Jul | 56105 | 553664 |

Aug | 59322 | 563201 |

Sep | 59877 | 568657 |

Oct | 60481 | 569384 |

Nov | 62356 | 573764 |

Dec | 63246 | 582746 |

Click on Data Analysis under Data Tab, and this will open Data Analysis Pop up for you.

Now select Regression from the list and click Ok.

Regression pop up will open.

Select Range of Sales $C$1:$C$13 in the Y-axis box as this is the dependent variable and $B$1:$B$14 in X-axis as advertisement spent is the independent variable.

Checkmark on the Labels box if you have selected headers in data else it will give you the error.

Select Output range if you want to get the value on the specific range on the worksheet else select New Worksheet Ply: and this will add a new worksheet and give you the result.

Then check on the Residuals box and click Ok.

This will add worksheets and give you the following result.

Let us understand the output.

**Summary Output**

**Multiple R:** This represents the correlation coefficient. The value 1 shows a positive relationship, and value 0 shows no relationship.

**R Square:** R SquareR SquareR Squared formula depicts the possibility of an event's occurrence within an expected outcome. It is "r = n (∑xy) – ∑x ∑y / √ [n* (∑x2 – (∑x)2)] * [n* (∑y2 – (∑y)2)]", where r is the Correlation coefficient, n is the number in the given dataset, x is the first variable in the context and y is the second variable. read more represents the coefficient of determination. This tells you the percentage of points fall on the regression line. 0.49 means that 49% of values fit the model

**Adjusted R square****:** This is adjusted R squareAdjusted R SquareAdjusted R Squared refers to the statistical tool which helps the investors in measuring the extent of the variance of the variable which is dependent that can be explained with the independent variable and it considers the impact of only those independent variables which have an impact on the variation of the dependent variable.read more, which requires when you have more than one X variable.

**Standard Error:** This represents an estimate of the standard deviation of error. This is the precision that the regression coefficient is measured.** **

**Observations:** This is the number of observations that you have taken in a sample.

**ANOVA – ****Df: **Degrees of freedom

**SS: **Sum of Squares.

**MS: **we have two MS

- Regression MS is Regression SS/Regression Df.
- Residual MS is the mean squared error (Residual SS / Residual Df).

**F: **F test for the null hypothesisNull HypothesisNull hypothesis presumes that the sampled data and the population data have no difference or in simple words, it presumes that the claim made by the person on the data or population is the absolute truth and is always right. So, even if a sample is taken from the population, the result received from the study of the sample will come the same as the assumption.read more.

**Significance F: **P-Values associated with Significance

**Coefficient: **Coefficient gives you the estimate of least squares.

**T Statistic: **T Statistic for null hypothesis vs the alternate hypothesis.

**P-Value: **This is the p-value for the hypothesis test.

**Lower 95% and Upper 95%: **These are the lower boundary and the upper boundary for the confidence intervalThe Confidence IntervalConfidence Interval refers to the degree of uncertainty associated with specific statistics & it is often employed along with the Margin of Error. Confidence Interval = Mean of Sample ± Critical Factor × Standard Deviation of Sample. read more

**Residuals Output.: **We have 12 observations based on the data. 2^{nd} column represents Predicted sales and 3^{rd} column Residuals. Residuals are basically the difference in predicted sales from the actual one.

#### Example#2

Select the predicted sales and marketing column

Go to the chart group under the insert tab. Select the scatter chart icon

This will insert the scatter plot in excelInsert The Scatter Plot In ExcelA Scatter plot in excel, also known as XY chart, is a two-dimensional type of chart representing data. In this chart, we have two sets of data on the X and Y axis co-related. This chart is primarily used in co-relation studies and regression studies of data.read more. See image below

Right-click on any point then select **Add Trendline in excelTrendline In ExcelA trend line, often known as the best-fit line, depicts the data's trend. It shows the overall trend, pattern, or direction based on the data points available.read more.** This will add a trendline to your chart.

- You can format the trendline by doing right-click anywhere on the trendline and then select format trendline.
- You can make more improvements to the chart. i.e., formatting the trendline, color and change title, etc
- You can also show the formula on the graph by checking in the Display formula on the chart, and display R squared value on the chart.

**Some More Examples of Linear Regression Analysis:**

- Prediction of Umbrella sold based on the Rain happened in Area.
- Prediction of AC sold based on the Temperature in Summer.
- During the exam season, sales of Stationary basically, Exam guide sales increased.
- Prediction of sales when Advertising has done based on High TRP serial where an advertisement is done, Popularity of Brand Ambassador, and the Footfalls at the place of holding where an advertisement is being published.
- Sales of a house based on the Locality, Area, and price.

#### Example #3

Suppose we have nine students with their IQ level and the number they scored on Test.

Student | Test Score | IQ |
---|---|---|

Ram | 100 | 145 |

Shyam | 97 | 140 |

Kul | 93 | 130 |

Kappu | 91 | 125 |

Raju | 89 | 115 |

Vishal | 86 | 110 |

Vivek | 82 | 100 |

Vinay | 78 | 95 |

Kumar | 75 | 90 |

**Step 1:** First, find out the dependent and independent variables. Here Test score is the dependent variable, and IQ is the independent variable as Test score is varying as IQ gets change.

**Step 2: **Go to Data Tab – Click on Data Analysis – Select regression – click Ok.

This will Open the Regression window for you.

**Step 3.** Input Test Score range in Input Y Range Box and IQ in Input X Range Box. (Check on Labels if you have headers in your data range. Select output options, then check on the desired Residuals. Click Ok.

You will get the summary output shown in the below Image.

**Step 4:** Analysing the Regression by Summary Output** **

**Summary Output**

**Multiple R:** Here, the correlation coefficient is 0.99, which is very near to 1, which means the Linear relationship is very positive.

**R Square:** R Square value is 0.983, which means that 98.3% of values fit the model.

**P-value: **Here, P-value is 1.86881E-07, which is very less than .1, Which means IQ has significant predictive values.

**See the chart below.**

You can see that almost all the points are falling inline or a nearby trendline.

#### Example #4

We need to predict sales of AC based on the sales & temperature for a different month.

Month | Temp | Sales |
---|---|---|

Jan | 25 | 38893 |

Feb | 28 | 42254 |

Mar | 31 | 42845 |

Apr | 33 | 47917 |

May | 37 | 51243 |

Jun | 40 | 69588 |

Jul | 38 | 56570 |

Aug | 37 | 50000 |

Follow the below steps to get the regression result.

**Step 1:** First, find out the dependent and independent variables. Here Sales is the dependent variable, and Temperature is an independent variable as Sales is varying as Temp gets change.

**Step 2: **Go to Data Tab – Click on Data Analysis – Select regression – click Ok.

This will Open the Regression window for you.

**Step 3.** Input Sales in Input Y Range Box and Temp in Input X Range Box. (Check on Labels if you have headers in your data range. Select output options, then check on the desired Residuals. Click Ok.

This will give you a summary output as below.

**Step 4: **Analyse the result.

**Multiple R:** Here, the correlation coefficient is 0.877, which is near to 1, which means the Linear relationship is positive.

**R Square:** R Square value is 0.770, which means that 77% of values fit the model

**P-Value: **Here, P-value is 1.86881E-07, which is very less than .1, Which means IQ has significant predictive values.

#### Example #5

Now let us do a Regression Analysis for Multiple independent variables:

You need to predict the sales of a Mobile which is going to launch next year. You have the price and Population of the countries which are affecting the sales of mobiles.

Mobile Version | Sales | Quantity | Population |
---|---|---|---|

US | 63860 | 858 | 823 |

UK | 61841 | 877 | 660 |

KZ | 60876 | 873 | 631 |

CH | 58188 | 726 | 842 |

HN | 52728 | 864 | 573 |

AU | 52388 | 680 | 809 |

NZ | 51075 | 728 | 661 |

RU | 49019 | 689 | 778 |

Follow the below steps to get the regression result.

**Step 1.** First, find out the dependent and independent variables. Here Sales is dependent variable and quantity and population. Both are independent variables as Sales is varying with quantity and population of the country.

**Step 2. **Go to Data Tab – Click on Data Analysis – Select regression – click Ok.

This will Open the Regression window for you.

**Step 3.** Input Sales in Input Y Range Box and select quantity and population in Input X Range Box. (Check on Labels if you have headers in your data range. Select output options, then check on the desired Residuals. Click Ok.

Now Run the regression using data analysis under Data Tab. This will give you the below result.

**Summary Output**

**Multiple R:** Here, the correlation coefficientCorrelation CoefficientCorrelation Coefficient, sometimes known as cross-correlation coefficient, is a statistical measure used to evaluate the strength of a relationship between 2 variables. Its values range from -1.0 (negative correlation) to +1.0 (positive correlation). read more is 0.93, which is very near to 1, which means the Linear relationship is very positive.

**R Square:** R Square value is 0.866, which means that 86.7% of values fit the model.

**Significance F: **Significance F is less than .1, which means that the regression equation has significant predictive value.

**P-Value**: If you look at P-value for Quantity and Population, you can see that values are less than .1, which means quantity and population have significant predictive value. The fewer P values mean that a variable has more significant predictive values.

However, both quantity and population have significant predictive value, but If you look at P-value for quantity and population, than you can see that quantity has a lesser P-value in excelP-value In ExcelP-Value, or Probability Value, is the deciding factor on the null hypothesis for the probability of an assumed result to be true, being accepted or rejected, & acceptance of an alternative result in case of the assumed results rejection. read more than Population. This means quantity has a more significant predictive value than Population.

### Things to Remember

- Always check the Dependent and Independent variables whenever you are selecting any data.
- Linear regression analysis considers the relationship between the Mean of the variables.
- This only model the relationship between the variables that are linear
- Sometimes it is not the best fit for a real-world problem. For Example: (Age and the wages). Most of the time, Wage increase as Age is increasing. However, after retirement, Age increases but wages goes decrease.

### Recommended Articles

This has been a guide to Linear Regression and its definition. Here we discuss how to perform a linear regression analysis in excel with the help of examples and a downloadable excel sheet. You can learn more about excel from the following articles –

- 16 Courses
- 15+ Projects
- 90+ Hours
- Full Lifetime Access
- Certificate of Completion

## Leave a Reply