How to Perform a Regression Analysis. Main Steps
When it comes to data analysis, regression is one of the most important methods. It is generally used to draw conclusions and make predictions. It can be quite a confusion for those without a statistical background. So I will try to make it as simple as possible.
EXAMPLE OF REGRESSION ANALYSIS
We shall consider an example where you are a sales manager who wants to predict the next month's numbers. Now there are many factors that can affect those numbers such as competitor's promotion and maybe a new and improved product. So there can be many variables that will affect future sales. There are people in the organization who might have a theory that the more rain this month, the more the sales will be. So there can be 100s of various factors.
Regression analysis is a way of mathematically sorting the variables which will actually have an impact on the sales. It basically gives us the answer to questions such as: What factors are most important? What factors can we ignore? How these factors are interrelated? With what confidence can we trust these factors?
Here these factors are called variables. You have a dependent variable
: the main factor which you are predicting e.g. here it is monthly sales. And there are independent variables
: factors which you think might have an impact on the dependent variable.
HOW TO DO A REGRESSION ANALYSIS?
Firstly, you need to gather data on the variables. You need to take all monthly sales data i.e. the dependent variable data (for example about the past 2 years) and any data on the independent variables (for example the average rainfall in the past 2 years). Then you plot that information on a chart like the one below:
The y-axis should always be the dependent variable and the x-axis is the independent variable. Each dot represents 1 month's data. So as you can see, in our case the sales are actually higher where the rainfall is high. But how much higher are they? For that you need an exact relation. Now imagine drawing a line through the chart that approximately runs through the middle of data points. This line is known as the regression line and it can be done using simple Excel. Excel will also give out a formula which will give the relation between the dependent and independent variable. Which will look like
Y = 400 + 6x + error term
The error term predicts that regression isn't perfectly precise as we have drawn a line through the middle and some error is there but you can ignore that to get a decent prediction.
Regression is always a go-to method in business analytics
. Managers use it to answer all sorts of business issues. Most companies use regression to address questions such as (why did the sales fall last month), or to predict some thing in the future (How will the ticket sales look like for the next 3 months), or to decide what to do (Should we try ad campaign A, B or C?).
Source: "HBR guide to Data Analytics Basics For Managers", pp.87-102