ECON 318 Empirical Project
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
ECON 318
Empirical Project - due December 1 at 11:59pm
In this project, you will apply the data analysis and econometrics skills that you have learned in the class so far. As in the labs, you will be analyzing real data and using it to reach a conclusion on an important economic question. Unlike in the labs, you will not have detailed guidance on how to complete each step of the project: you will have to decide on how to merge, rearrange, and clean the data sets to make them usable. As in the real world, there is no single correct way to do this project. A number of the techniques that you have learned in class could be used to answer this question, and any of these will be fine, as long as you make it clear why your approach is appropriate (but you only need to pick one).
For this project, you will be working as a consultant for a company that has run a pilot project of allowing some of its employees to work from home rather than at a central office. The company was having problems retaining their workforce, where many workers were quitting as a result of the long commutes they faced to get to work. They decided to try out allowing some employees to work from home to address this problem, but were concerned about whether working from home may hurt employee productivity. They were also concerned about how this affects employee satisfaction and retention. They have hired you to analyze data and provide recommendations on whether to expand working from home for their full workforce. They are particularly interested in your findings on the effect of working from home on employee performance and retention.
The company is a leading Chinese travel agency that aggregates information on hotels, flights and tours; makes reservations and obtains tickets for clients; and generates revenue through commissions. Most of their business is done over the phone, and they tested out work from home for their call center employees, who are responsible for answering calls from customers and booking orders or resolving issues. Calls to the company hotline are routed to the next available call center employee, and employees work in shifts. Each employee typically works five shifts a week, and their monthly earnings are the sum of flat wage ($160 per month) plus a bonus based on the number of calls that they answered in that month.
For the pilot program, 249 call center employees were randomly allocated into a treatment group (131 employees) and a control group (118 employees). The treatment group worked from home for four out of their five shifts, and spent the fifth shift in the office. Otherwise their specified work hours (9 am to 5pm) and the system of receiving incoming calls was the same. Employees working from home used the same computer terminals and communications software, so the only differences between the treatment and control groups were the location from which they were working.
The pilot began in December 2010 and lasted for nine months, until August 2011. Employees in the work from home group had to remain in that group for the duration of the pilot, while the control group was not allowed to work from home during the pilot. The company has provided you with a number of datasets from before the pilot began as well as when it was running that you can use in your analysis. You do not need to use all of them, but pick whichever you think are the most useful for your analysis. The data sets are available on the class dropbox folder under “Empirical Project” and the variables are labelled:
- EmployeeStatus.dta – contains the employee ID of each employee in the pilot and a variable denoting whether they were assigned to the treatment group or control group (equal to one if assigned to treatment).
- EmployeeCharacteristics.dta – contains personal information on each employee assigned to the pilot at the start of the pilot (e.g. marital status, age , if completed high school)
- Quits.dta – contains a variable for whether the employee quit during the study period
- QuitDate.dta – contains information on the month during which employees quit during the study period (if the employee did quit). This is a panel data set at the monthly level, where each employee has nine observations: one for each month of the experiment. There is a dummy variable for whether the employee quit in that month and whether the employee was still working in that month.
- Attitudes.dta – surveys of employees on their satisfaction, with one survey from prior to the start of the pilot and another from after. The firm did five rounds of surveys, with the first conducted prior to the start of the pilot and the next four conducted after the pilot had begun. The firm did not survey all their employees, but did survey a random sample of 107 from treatment and 64 from control. The surveys measured three variables: satisfaction with the job (on a scale from 1 to 7), satisfaction in general (on a scale from 0 to 100) and satisfaction with their life trajectory (on a scale from 0 to 40).
- Performance_Panel.dta – monthly performance data on workers for the months before and after the experiment. Data is at the monthly level, where the variable post indicates if the month is in the pre-experiment period (January to November 2010) or in the period during the experiment (December 2010 to August 2011) . The measures of performance are (1) average calls made per hour worked during the month; (2) total calls made in the month; and (3) a performance evaluation rating of their performance in that month (on a scale from 0 and 100).
- Performance.dta – data on average performance of workers over the period before and after the experiment. Each worker has one observation in the pre-experiment period and one in the period during the experiment (denoted by the variable post).
In order to use the data, you will likely have to do some combination of the following: (i) merging data sets; (ii) recoding variables; (iii) cleaning variables (e.g. checking for and removing non-nonsensical values); (iv) creating new variables for analysis; and (v) reshaping the data to be in a format you want to use for analysis. For example, there may be places with clear data entry errors that you will need to replace with missing values in order to carry out the analysis. One good way to spot data entry errors is to tab or plot the variables and look for nonsensical values. You may want to refer back to the week 1 slides introducing you to different Stata functionalities, and you are welcome to explore features in Stata that we have not previously used in the class: there are some references in the week 1 slides on using Stata, as well as a wealth of information available on the internet through searching on google (e.g. entries on websites like StackExchange and Statalist).
You will write up a short report describing your analysis of the data . You must use Stata or a comparable software like R (i.e. not Excel) to produce the figures. Your tables should look professional and should not simply be copied and pasted from Stata output (see the labs in which you made professional looking tables). Similarly, your figures should look professional, with labeled axes and reasonable looking visualizations. Tables and figures will not count against the page limit. The report should be written in language so that someone who has taken this class can understand it (i.e., you don’t have to simplify the technical parts for the client).
When writing this report, it should be broken into three sections. The first section should describe the data that you use and any data cleaning that you did. This should give sufficient detail that someone else could emulate what you did. If you find data entry errors in any variable in the data sets that you use, you should list them here with the name of the data set, information that could be used to identify this observation (e.g. personid, year, month), name of the variable, the potentially erroneous value of the variable, and reason for your data cleaning decision. You should do this as a table, where the table should mention each of the data sets you use, including if there are no problem values in that data set. It should follow the format below, listing each data set you used alphabetically.
To give an example, suppose you had used the datasets EmployeeStatus.dta, EmployeeCharacteristics.dta and Performance_Panel.dta for your report and found two errors in the variable calls_per_hour in Performance_Panel.dta, but none in the other two datasets. Then your table should look like the following. Please make sure tofollow thisformat so you are graded correctly!
Data set |
Observation |
Variable |
Value |
Change and reason for change |
Employee Characteristics.dta |
No problem values |
|
|
|
EmployeeStatus.dta |
No problem values |
|
|
|
Performance_Panel.dta |
personid = 3906 year = 2010 month =3 |
calls_per_hour |
999999 |
This is an outlier and does not seem possible. I changed it to a missing value |
Performance_Panel.dta |
personid = 5993 year = 2012 month =1 |
calls_per_hour |
-3 |
Negative calls per hour is not possible. I changed it to a missing value |
The second section should describe the empirical strategy that you use, why you use it, and what variables you will be analyzing. The third should discuss your findings and include the graphs and tables that you produced.
For this project, you can either work alone or as a pair with another student in the same section of the class. This decision is completely at your discretion. If you choose to work alone, your report should be no more than 4 single spaced pages and must include at least two tables and one figure. If you work in a pair, your report should be no more than 6 single spaced pages and must include a total of at least 5 tables and figures.
In addition to your report, you should create and hand in a single Stata .dofile that performs all of the cleaning and analysis. You must write this yourself or within your pair. You should not copy it from your classmates outside of your pair -- that would be considered plagiarism and treated accordingly, where we will check the code files for similarities. Your grade will be based on the following rubric.
Grading Rubric |
||
|
||
Data Cleaning (25pts) |
||
|
Merge and rearrange data for analysis using Stata |
|
|
Identify and address data quality issues, such as problem values for the variables of interest. Note any issues in your report |
|
|
||
Empirical strategy (25pts) |
||
|
Apply correct empirical strategy to successfully estimate the causal effect of working from home |
|
|
Selection of appropriate dependent variable(s) to answer this question |
|
|
||
Tables and Figures (25pts) |
||
|
Creation of tables that are well-explained in the report. These should be in a professional-looking format (e.g. using esttab in Stata) with the relevant portions of the table well-labeled. |
|
|
Creation of figures that are well-explained in the report. These should be in a professional-looking format with axes and other relevant portions of the graph labelled. |
|
|
||
Discussion/Conclusion (25pts) |
||
|
Accurately and concisely summarize the results of your analysis and take-aways in a write-up of no more than 4 pages. This should answer the assignment from the company on the effect of working from home |
|
|
Correctly interpret signs, magnitudes and statistical significance for any of the variables of interest |
|
2022-11-28