Econ 2123A Regression in Excel: A Short Guide
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Regression in Excel: A Short Guide
Econ 2123A
1 How to import data into excel?
● Open a blank workbook.
● Go to File (top left corner) and select Import from the drop-down menu.
● Select the file type (“ .csv” or “ .txt”) and import.
● Based on the data, choose the delimiter (“ .csv” is always comma).
Figure 1: Import-Data
2 How to run a simple and multiple regression?
Note: I am using a particular generated dataset to show the regression. The dataset contains 100 observations. Each observation corresponds to the avg price of a car model’s average price, weight, mileage, and origin (domestic or foreign).
2.1 Simple Regression
In simple regression, there is only one independent variable. In our example we consider weight as the independent variable and avg price as the dependent variable. So in this simple regression, we aim to understand the relationship between the avg price of a car model and its weight.
price = α + β * weight + f, (1)
where price is the avg price of the car model (dependent variable), α is the regression constant,β is the regression coefficient, weight is the weight of the car model (independent variable), and f is the error.
● After importing data into excel. Open the “Data”” tab and click on “Data Anal- ysis”” and select “Regression” from the list. You will get a new dialog box titled “Regression” .
● In the dialog box, select the “Input Y Range”: this the range of independent vari- able. In our example, independent variable is in column “A”, and the range is from A$1:A$101 (A$1 contains label).
● Next in the dialog box, select the “Input X Range”: this the range of dependent variable. In our example, dependent variable is in column “B”, and the range is from B$1:B$101 (B$1 contains label).
● Tick the label if you have labels in the dataset. In our example, we have labels so we will tick it (It is a good practice to have labels).
● Click “ok” and you should have the results in a new sheet.
Figure 2: Data Analysis
Figure 3: Regression Dialog Box-Simple
2.2 Multiple Regression
In multiple regression, there are more than one independent variable. In our example we consider weight, mpg and origin as the independent variables and avg price as the dependent variable. So in this multiple regression, we aim to understand the relationship between the avg price of a car model and its weight, mpg and origin.
price = α + β1 * weight + β2 * mpg + β3 * origin + f, (2)
where price is the avg price of the car model (dependent variable), α is the regression constant, β1 ,β2 and β3 are the regression coefficients, weight, mpg and origin are the independent variables and f is the error. Note: α and f are different than what we had in previous equation.
● After importing data into excel. Open the “Data”” tab and click on “Data Anal- ysis”” and select “Regression” from the list. You will get a new dialog box titled “Regression” .
● In the dialog box, select the “Input Y Range”: this the range of independent vari- able. In our example, independent variable is in column “A”, and the range is from A$1:A$101 (A$1 contains label).
● Next in the dialog box, select the “Input X Range”: this the range of dependent variable. In our example, dependent variable is in column “B”, and the range is from B$1:D$101 (B$1 contains label).
● Tick the label if you have labels in the dataset. In our example, we have labels so we will tick it (It is a good practice to have labels).
● Click “ok” and you should have the results in a new sheet.
Note: The only difference between simple regression and multiple regression in terms of implementation is the change in “Input X Range” in step 3.
Figure 4: Regression Dialog Box-Multiple
3 How to report and interpret results?
Let us check the output we got from previous exercise.
Figure 5: Summary Output-Simple
Figure 6: Summary Output-Multiple
● The first important piece of information to consider from the output table is R- Square from the Regression Statistics. R-Square indicates the goodness of the fit. In the simple regression, R-Square is 0.584, meaning 58.4% of avg price of a car model (Y) is explained by the weight (X). In the multiple regression, the R-Square value is 0.76, meaning 76% of the avg price of a car model (Y) is explained by weight, mpg, and origin.
● Now we look at the coefficients. We see that β from equation 1 is 6.2 and β1 , β2 , and β3 from equation 2 is 6.82, 20.09, and 3536.32.
● The coefficient values are not enough to give the complete picture. We have to look at standard errors+t-stat+p-value to understand whether these coefficients make any sense.
— The standard error tells us if the coefficient is different than zero. If the
standard error is larger than the coefficient, then it is highly likely that your coefficient is zero or insignificant. In our example, we see that the standard error for the coefficient of “mpg” is bigger than the coefficient (coefficient is 20.09 whereas the standard error is 36.45)
— The t-stat is coefficient divided by the standard error (check it). The value
of the t-stat for each coefficient is compared to the Student t-distribution to calculate the p-value. A p-value of 0.05 or less is desired.
— The idea of p-value is very simple. The data you have is considered to come
from a random distribution and with that consideration it is asked how likely the result (coefficient) you see is possible.
4 How to construct new variables with the predicted values and the residuals?
To obtain the predicted values and the residuals we follow the same as before but now tick an extra box containing residuals.
● After importing data into excel. Open the “Data”” tab and click on “Data Anal- ysis”” and select “Regression” from the list. You will get a new dialog box titled “Regression” .
● In the dialog box, select the “Input Y Range”: this the range of independent vari- able. In our example, independent variable is in column “A”, and the range is from A$1:A$101 (A$1 contains label).
● Next in the dialog box, select the “Input X Range”: this the range of dependent variable. In our example, dependent variable is in column “B”, and the range is from B$1:D$101 (B$1 contains label).
● Tick the label if you have labels in the dataset. In our example, we have labels so we will tick it (It is a good practice to have labels).
● Tick “Residuals” to obtain the predicted values and residuals.
● Click “ok” and you should have the results in a new sheet.
Figure 7: Regression Dialog Box- Residuals
Figure 8: Summary-Output with Residuals
2022-06-30