Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ECON 104

ASSIGNMENT #1

1.

Statistics on the variable persons show that there are 156,000 individuals in the data set.

If the value of the variable treat_real is 1, it means that they are assigned to get a call, that is, they belong to the treatment group; Conversely, if the variable value is 0, it is a member of the control group. Statistics of the two values of the variable treat_real      show that 1 4,395 individuals were assigned to the treatment group and 83,051  individuals were assigned to the control group.

Statistics of percentages with a value of 1 in the three variables of vote98, vote00 and vote02 show that in the elections of 1998, 2000 and 2002, the percentage of individuals voted was 50.85%, 67.44% and 53.26% respectively.

2.

Because the effectiveness of randomization is to be demonstrated, the following variables are chosen for fixed features: newreg, age,female, state, comp_mi,    comp_ia.

Variables

G1(0)    Mean1

G2(1)    Mean2

MeanDiff

p-value

newreg

age

female

state

comp_mi

83051

83051

80924

83051

83051


0.0480

55.80

0.566

1

0


14395

14395

14048

14395

14395


0.0510

55.55

0.561

1

0


-0.00300

0.245

0.00500

0

0


0.096* 0.150 0.227

.

 

.


Table 1

There are 6 variables in the table that represent the fixed characteristics of an individual. The G1(0) column represents the number of individuals in the control     group and Mean1 is the mean value of this variable in the control group. The G2(1) column represents the number of individuals in the treatment group; Mean2 is the    mean value of this variable in the treatment group. The MeanDiff column represents the difference between the means of two groups, and the p-value column is the p-    value of the difference.

3.

We can see that the means of these 6 variables in the treatment and control groups was not very different, this is for the randomicity of the distribution of the treatment group and control group provides some evidence. Because if you're randomized, then the      variables that represent individual traits won't be affected by whether you're in the       treatment or control group. The p value is obtained by testing the difference between   the mean values of the two groups. If there is significance, it means that the grouping  has affected the value of this variable.

As shown in the table above, only one of these variables has p-values less than 0.1,    that is, they are all not statistically significant at the 90% confidence level. In              addition, although the difference between the two groups of newreg is significant at   the 90% confidence level, the mean difference is only 0.003, which is very small. The effect levels of the treatment group and the control group were basically the same       without significant difference before the treatment. Therefore, the table 1 in question  2 is consistent with the randomization being correctly implemented.

4.

The mean rate of voting in 2002 for the treatment and control groups was 60.68% and 59.66%, respectively. This alone does not directly account for the intervention            increased voter turnout. It appears that the mean rate of voting in 2002 increased by    1.02%, but this comparison alone cannot conclude that these interventions will           increase voter turnout. Because the factors that cause this difference may not just be   the intervention in the treatment.

In the same way, the mean rates of voting in the treatment and control groups were     found to be 5 7. 14% and 57.52%, respectively; The mean rate of voting in the 2000    treatment group and the control group was 73.79% and 73.20%, respectively. You can see a comparison of the difference in 2002, 1998 and 2002 The difference is smaller.  After observing this result, a preliminary conclusion can be drawn: the difference in   turnout between the two groups in 2002 was not due to a pre-existing difference in     voting tendencies, but more likely that the intervention in the treatment actually          increased voter turnout.

5.

. reg vote02 treat_real

Source

SS           df       MS

Model Residual

1.28210263 23422.5244

1

97,444

1.28210263 .240369078

Total

23423.8065    97,445  .240379768

Number of obs F(1, 97444)  Prob > F

R-squared    Adj R-squared Root MSE

=

=

=

=

=

=

 

97,446

5.33

0.0209

0.0001

0.0000

.49027

vote02

Coefficient

Std. err.

t

P>|t|

[95% conf.

interval]

treat_real

.0102227

.0044263

2.31

0.021

.0015471

.0188982

cons

.5965852

.0017012

350.68

0.000

.5932508

.5999197

The intervention of the treatment group was assigned to answer the phone, so              treat_real was used as the independent variable and the voting data v ote02 from 2002 as the dependent variable. The regression results show that the regression coefficient

of the independent variable treat_real is 0.01, and the p-value is 0.021, which is smaller than 0.05. Therefore, this difference is statistically significant at the 95%         level. But in a practical sense, the model's R square is very small and close to 0,          which means that the explanatory power of explaining the difference in turnout is not  very strong by the factor of whether it is assigned to answer the phone. Also, the value of the regression coefficient for the independent variable treat_real is small.                Therefore, in a practical sense, this difference is not significant.

6.

Regression

1

2

3

4

5

6

7

treat_real

0.010

0.011

0.012

0.012

0.012

0.012

0.012

newreg

 

-0.321

-0.227

-0.223

-0.223

-0.223

-0.223

age

 

 

0.005

0.006

0.006

0.006

0.006

female

 

 

 

-0.028

-0.028

-0.028

-0.028

state

 

 

 

 

0

0

0

comp_mi

 

 

 

 

 

0

0

comp_ia

 

 

 

 

 

 

0

Table 2    Estimated coefficients

7.

By adding these control variables, the regression coefficient of the argument                treat_real is larger than that of the unary regression equation, varying by about 0.001  to 0.002. For statistical significance, look at the P-value: the P-value for unary             regression is 0.021, and the P-values for the regression coefficients of treat_real after adding control variables are 0.01, 0.004, and 0.006, respectively. It can be seen that     the P-value decreases and the statistical significance of the proof coefficient increases: from the 95% confidence level to the 99% confidence level.

The regression coefficient of treat_real i.e., the estimated treatment effect increased, after several control variables were added, i.e., variables related to the characteristics of the individual that were fixed or prior to treatment. In addition, the significance of regression results is enhanced, and the confidence level is also improved. These         results all indicated that the inclusion of these control variables made our estimation  of treatment effect more accurate and effective. Without the addition of these              individual covariables such as age and sex, it is not convincing enough to use a single explanatory variable to explain the change of dependent variable in the regression      equation. In treatments and in the analysis of treatment results, covariates can be

characteristic of many aspects of the individuals involved in the treatment, and the     experimenters must ensure randomness when assigning subjects to the control and     treatment groups respectively. That is, as mentioned above, the distribution of            variables related to these individual characteristics was roughly the same between the control group and the treatment group. And the existence of these covariates can also play an important role in estimating the treatment effect from the treated group.

8.

Comparing the turnout of the treatment and control groups does not yield an unbiased estimate when only the previous steps have been performed. Because if the regression coefficient of treat_real after multiple regression is to be the effect of treatment          (being assigned to answer the phone) on voter turnout, it is not enough to simply        randomly assign individuals to treatment and control groups and add some control      variables. It also requires rigorous and correct causal inference. For example, make    sure that no other variable is associated with both the independent variable and the     dependent variable.

9. Stata Code

use "comp_ia_RCT_Last_Name_P_to_Z.dta"

*1

sum persons

tabulate treat_real

tabulate vote98

tabulate vote00

tabulate vote02

*2/3

logout, save (Table1) word replace: ttable3 newreg age female state comp_mi comp_ia, by(treat_real) pvalue

*4

tabulate vote02

tabulate vote02 if treat_real == 0

tabulate vote02 if treat_real == 1

tabulate vote98 if treat_real == 0

tabulate vote98 if treat_real == 1

tabulate vote00 if treat_real == 0

tabulate vote00 if treat_real == 1

*5

reg vote02 treat_real

*6/7

reg vote02 treat_real newreg

reg vote02 treat_real newreg age

reg vote02 treat_real newreg age female

reg vote02 treat_real newreg age female state

reg vote02 treat_real newreg age female state comp_mi

reg vote02 treat_real newreg age female state comp_mi comp_ia

clear

exit