ECON 104 ASSIGNMENT #1
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
ECON 104
ASSIGNMENT #1
1.
Statistics on the variable persons show that there are 156,000 individuals in the data set.
If the value of the variable treat_real is 1, it means that they are assigned to get a call, that is, they belong to the treatment group; Conversely, if the variable value is 0, it is a member of the control group. Statistics of the two values of the variable treat_real show that 1 4,395 individuals were assigned to the treatment group and 83,051 individuals were assigned to the control group.
Statistics of percentages with a value of 1 in the three variables of vote98, vote00 and vote02 show that in the elections of 1998, 2000 and 2002, the percentage of individuals voted was 50.85%, 67.44% and 53.26% respectively.
2.
Because the effectiveness of randomization is to be demonstrated, the following variables are chosen for fixed features: newreg, age,female, state, comp_mi, comp_ia.
Variables
G1(0) Mean1
G2(1) Mean2
MeanDiff
p-value
newreg
age
female
state
comp_mi
83051
83051
80924
83051
83051
0.0480
55.80
0.566
1
0
14395
14395
14048
14395
14395
0.0510
55.55
0.561
1
0
-0.00300
0.245
0.00500
0
0
0.096* 0.150 0.227
.
.
Table 1
There are 6 variables in the table that represent the fixed characteristics of an individual. The G1(0) column represents the number of individuals in the control group and Mean1 is the mean value of this variable in the control group. The G2(1) column represents the number of individuals in the treatment group; Mean2 is the mean value of this variable in the treatment group. The MeanDiff column represents the difference between the means of two groups, and the p-value column is the p- value of the difference.
3.
We can see that the means of these 6 variables in the treatment and control groups was not very different, this is for the randomicity of the distribution of the treatment group and control group provides some evidence. Because if you're randomized, then the variables that represent individual traits won't be affected by whether you're in the treatment or control group. The p value is obtained by testing the difference between the mean values of the two groups. If there is significance, it means that the grouping has affected the value of this variable.
As shown in the table above, only one of these variables has p-values less than 0.1, that is, they are all not statistically significant at the 90% confidence level. In addition, although the difference between the two groups of newreg is significant at the 90% confidence level, the mean difference is only 0.003, which is very small. The effect levels of the treatment group and the control group were basically the same without significant difference before the treatment. Therefore, the table 1 in question 2 is consistent with the randomization being correctly implemented.
4.
The mean rate of voting in 2002 for the treatment and control groups was 60.68% and 59.66%, respectively. This alone does not directly account for the intervention increased voter turnout. It appears that the mean rate of voting in 2002 increased by 1.02%, but this comparison alone cannot conclude that these interventions will increase voter turnout. Because the factors that cause this difference may not just be the intervention in the treatment.
In the same way, the mean rates of voting in the treatment and control groups were found to be 5 7. 14% and 57.52%, respectively; The mean rate of voting in the 2000 treatment group and the control group was 73.79% and 73.20%, respectively. You can see a comparison of the difference in 2002, 1998 and 2002 The difference is smaller. After observing this result, a preliminary conclusion can be drawn: the difference in turnout between the two groups in 2002 was not due to a pre-existing difference in voting tendencies, but more likely that the intervention in the treatment actually increased voter turnout.
5.
. reg vote02 treat_real
Source |
SS df MS |
||
Model Residual |
1.28210263 23422.5244 |
1 97,444 |
1.28210263 .240369078 |
Total |
23423.8065 97,445 .240379768 |
Number of obs F(1, 97444) Prob > F
R-squared Adj R-squared Root MSE
=
=
=
=
=
=
97,446
5.33
0.0209
0.0001
0.0000
.49027
vote02 |
Coefficient |
Std. err. |
t |
P>|t| |
[95% conf. |
interval] |
treat_real |
.0102227 |
.0044263 |
2.31 |
0.021 |
.0015471 |
.0188982 |
cons |
.5965852 |
.0017012 |
350.68 |
0.000 |
.5932508 |
.5999197 |
The intervention of the treatment group was assigned to answer the phone, so treat_real was used as the independent variable and the voting data v ote02 from 2002 as the dependent variable. The regression results show that the regression coefficient
of the independent variable treat_real is 0.01, and the p-value is 0.021, which is smaller than 0.05. Therefore, this difference is statistically significant at the 95% level. But in a practical sense, the model's R square is very small and close to 0, which means that the explanatory power of explaining the difference in turnout is not very strong by the factor of whether it is assigned to answer the phone. Also, the value of the regression coefficient for the independent variable treat_real is small. Therefore, in a practical sense, this difference is not significant.
6.
Regression |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
treat_real |
0.010 |
0.011 |
0.012 |
0.012 |
0.012 |
0.012 |
0.012 |
newreg |
|
-0.321 |
-0.227 |
-0.223 |
-0.223 |
-0.223 |
-0.223 |
age |
|
|
0.005 |
0.006 |
0.006 |
0.006 |
0.006 |
female |
|
|
|
-0.028 |
-0.028 |
-0.028 |
-0.028 |
state |
|
|
|
|
0 |
0 |
0 |
comp_mi |
|
|
|
|
|
0 |
0 |
comp_ia |
|
|
|
|
|
|
0 |
Table 2 Estimated coefficients
7.
By adding these control variables, the regression coefficient of the argument treat_real is larger than that of the unary regression equation, varying by about 0.001 to 0.002. For statistical significance, look at the P-value: the P-value for unary regression is 0.021, and the P-values for the regression coefficients of treat_real after adding control variables are 0.01, 0.004, and 0.006, respectively. It can be seen that the P-value decreases and the statistical significance of the proof coefficient increases: from the 95% confidence level to the 99% confidence level.
The regression coefficient of treat_real i.e., the estimated treatment effect increased, after several control variables were added, i.e., variables related to the characteristics of the individual that were fixed or prior to treatment. In addition, the significance of regression results is enhanced, and the confidence level is also improved. These results all indicated that the inclusion of these control variables made our estimation of treatment effect more accurate and effective. Without the addition of these individual covariables such as age and sex, it is not convincing enough to use a single explanatory variable to explain the change of dependent variable in the regression equation. In treatments and in the analysis of treatment results, covariates can be
characteristic of many aspects of the individuals involved in the treatment, and the experimenters must ensure randomness when assigning subjects to the control and treatment groups respectively. That is, as mentioned above, the distribution of variables related to these individual characteristics was roughly the same between the control group and the treatment group. And the existence of these covariates can also play an important role in estimating the treatment effect from the treated group.
8.
Comparing the turnout of the treatment and control groups does not yield an unbiased estimate when only the previous steps have been performed. Because if the regression coefficient of treat_real after multiple regression is to be the effect of treatment (being assigned to answer the phone) on voter turnout, it is not enough to simply randomly assign individuals to treatment and control groups and add some control variables. It also requires rigorous and correct causal inference. For example, make sure that no other variable is associated with both the independent variable and the dependent variable.
9. Stata Code
use "comp_ia_RCT_Last_Name_P_to_Z.dta"
*1
sum persons
tabulate treat_real
tabulate vote98
tabulate vote00
tabulate vote02
*2/3
logout, save (Table1) word replace: ttable3 newreg age female state comp_mi comp_ia, by(treat_real) pvalue
*4
tabulate vote02
tabulate vote02 if treat_real == 0
tabulate vote02 if treat_real == 1
tabulate vote98 if treat_real == 0
tabulate vote98 if treat_real == 1
tabulate vote00 if treat_real == 0
tabulate vote00 if treat_real == 1
*5
reg vote02 treat_real
*6/7
reg vote02 treat_real newreg
reg vote02 treat_real newreg age
reg vote02 treat_real newreg age female
reg vote02 treat_real newreg age female state
reg vote02 treat_real newreg age female state comp_mi
reg vote02 treat_real newreg age female state comp_mi comp_ia
clear
exit
2023-02-17