闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

DS UA 9201

Causal Inference

Fall 2022

Exercise 1 Instrumental variables (30 points)

Patients who get surgery, for example for orthopaedic reasons, are often advised by the doctors, subsequently to surgery, to get physiotherapy, that is, a series of exercises to help rehabilitation and more complete recovery. However, the costs of physiotherapy may often deter patients from following it. It is therefore important to try to show the potential beneﬁts of physiotherapy, so that more patients can become convinced to follow it.

In the period of 4 years, three cooperating hospitals randomly assigned each of the 537 eligible patients, who had gone through an orthopeadic operation, in one of two groups : patients in the ﬁrst group, (Zi = 1), were oﬀered the opportunity to get physiotherapy at 50% reduced hospital fees; for patients assigned in the sec- ond group, physiotherapy was available at the standard cost. For each patient, the recorded variables, in addition to assignment Zi , are: whether or not the patient got physiotherapy, Tiobs = 1 for yes, 0 for no; an assessment of the patients recovery 3 months after surgery, Yiobs = 1 for satisfactory, 0 for unsatisfactory or poor. The as- sessment of this studys data was done by physicians blinded to both the assignment Zi and the taking (or not) of physiotherapy by the patient. The table below gives the counts, nzty , of patients assigned Zi = z and with physiotherapy-taking status Tiobs = t and outcome Yiobs = y .

	Z	Tobs	Yobs	n
0	0	0	0	185
1	0	0	1	123
2	0	1	0	9
3	0	1	1	41
4	1	0	0	37
5	1	0	1	20
6	1	1	0	26
7	1	1	1	96

Question 1 (5 points)

Estimate the intention-to-treat (ITT) eﬀect of oﬀering the discount on the improve- ment of recovery, E[Y (Z = 1)] · E[Y (Z = 0)], using a diﬀerence-in-means estimator. Also estimate the standard error and the asymptotic 95% conﬁdence interval. Ex- plain why, the ITT eﬀect can be diﬀerent from the contrast that compares outcomes Yobs of the patients who take vs. do not take physiotherapy.

Be aware that the input data is aggregated, so you should either used weighted estimators (for the mean and standard error). You can use Python code for the computations, and in that case manually create the input dataframe from the given table.

Question 2 (4 points)

In plain language of this setting, and using the potential treatment notation, what are the four possible strata deﬁned by the instrument and the treatment values?

Question 3 (6 points)

In plain language of this setting, and in terms of potential outcomes, state the four assumptions under which the randomizer Zi is an ”instrument”, and the local ATE is non-parametrically identiﬁed. Discuss their plausibility.

Question 4 (5 points)

Which of the assumptions from the question 3 is/are enough to estimate the propor- tion of ”never-takers”, i.e. patients who would not take physiotherapy whether or not they had been oﬀered the discount in this study? Under this/these assumption(s), report estimates of the proportions of the groups deﬁned in question 2.

Question 5 (5 points)

Under assumptions from question 3, estimate the local ATE. In which group deﬁned in question 2 is this treatment eﬀect estimated? You can use the python function IV2SLS to provide the standard error and a 95% conﬁdence interval for your estimate.

Question 6 (5 points)

Discuss brieﬂy (i) the clinical and (ii) the health policy implications of the diﬀerence between your estimates in question 5 vs. question 1.

Exercise 2 Regression Discontinuity Design (30 points)

In this problem you will be analyzing a dataset from a 2011 paper by Carpenter and Dobkin. The full citation for the paper is:

Carpenter, C., & Dobkin, C. (2011). The minimum legal drinking age and public health. Journal of Economic Perspectives, 25(2), 133-56.

This paper examines evidence linking the legal alcohol drinking age in the US (21) to increased likelihood of accidents, hospitalization, and health hazards in gen- eral. The main identiﬁcation strategy employed by the authors is a sharp Regression Discontinuity Design (RDD), where age is the running variable, and 21 is the cutoﬀ .

The dataset contains 80 observations, where each unit is an age group, and values are collected over 4 US states.

The dataset is ER .csv and it contains ﬁve variables:

● age – The age of the unit, where the decimal indicates month of the year

● all – The total number of ER admissions

● injury – The total number of ER admissions due to injury

● illness – The total number of ER admissions due to viral illness

● alcohol – An adjusted index of how many ER admissions were linked to alcohol consumption

Question 1 (5 points)

Preprocess the data by creating a centralized version of the running variable, and a binary variable indicating the treatment.

Question 2 (10 points)

Estimate the eﬀect of being legally able to purchase alcohol (age > 21) on the all, injury, and alcohol variables using an RDD with bandwidth = 1. For each of the three outcomes report point estimates and 95% conﬁdence intervals. Repeat the analysis for bandwidth = 0.5 years, and bandwidth = 2 years. Discuss and interpret your results. Which outcome variable seems to be associated with the largest eﬀect? Does bandwidth selection inﬂuence results? You can use the model wls from package statsmodels .formula .api, as in the recitation.

Question 3 (5 points)

Using the entire dataset (no need to aggregate), create and show RDD plots that visualize the discontinuity for each of the three outcome variables used in Question 1. The plots should display observed points, regression lines, and vertical lines to indicate the bandwidth.

Question 4 (10 points)

Conduct a placebo RDD analysis using the illness variable as outcome: since viral illnesses are not caused by alcohol consumption, we have no reason to expect that being legally able to drink will have an eﬀect on this variable. Report both RDD estimates and 95% CIs, and make a RDD plot for this outcome variable. Is there a treatment eﬀect and is it statistically signiﬁcant? What does this suggest about the plausibility of the RDD assumptions?

2022-12-08

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言