Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Homework Assignment 3

Models, Interactions, Threats to Validity

(Due March 1, 2024)

This week, you will examine data from most Supreme Court cases in the modern era, and will examine whether the emotion with which justices respond during a case’s oral arguments is a good predictor of their eventual vote. You can find basic information about the US Supreme Court, and what it does, here. Some court observers have noted that the way in which justices react during oral arguments can often reveal just as much as what they actually say. Changes in vocal inflection, like vocal pitch, are often beyond the speaker’s control. In particular, justices may display emotional arousal in their speech when they interact with some-one they disagree with. In this assignment, you will assess these ideas empirically. Along the way, you will learn a little about recent applications of data analysis of audio and text, and explore some questions about its strengths and weaknesses.

For this assignment, you will use the final analysis dataset from published peer-reviewed research on this question. We have also included the source article, by Bryce J. Dietrich, Ryan D. Enos, and Maya Sen, as well as the article’s supplemental appendix. As always, write everything up professionally; your overall presentation style is important. You should not have any raw R output or code copied/pasted into your final document. All questions should be answered clearly and concisely in full sentences. Plots, tables, etc., should be clearly labeled and referenced appropriately in your writeup.

1. The Basics (20pts)

Create a new .R script file (“LastName_PID_HW3.R” – e.g., “Garfias_12345678_HW3.R”; no black spaces in the script name). Your script should generate all of the output, tables, and graphics used in your written submission and needs to run on its own, fully, without errors, to get full credit. You should assume that we will run your .R file in the same directory as the original data files; do not assume that the data is already loaded.

• Script (.R file) named correctly and runs without errors. (10pts)

• Script (.R file) does not overwrite original data, does the requisite analysis, and outputs any figures or tables used in your writeup (labeled correctly, and saved with filenames that include your last name and PID). (10pts)

2. Getting to Know the Data & Descriptive Statistics (15pts)

To analyze thousands of hours of audio from oral arguments at the Supreme Court, this research used both human and computer power. Humans are generally good at making wholistic judgements of the verbal re-sponses of others; journalists covering court deliberations, in particular, routinely make assessments about judges’ reactions, casting them, for example, as “pretty skeptical” or “friendly.” These appraisals, however, are hard to measure systematically. Computers can assist in measuring some aspects of judges’ reactions to oral arguments, for instance by rapidly identifying judges’ “positive” and “negative” word choice, or by measuring vocal pitch. Here, I want you to think about the data generating process and how the voice pitch and word choice data you are using came to be. Humans did NOT directly code this variable for each observation based on a subjective assessment.

1. Write a 3-sentence (maximum) description of the data (“justice results.tab”) as though you were ex-plaining it to a colleague. What is the unit of observation? What does it mean that a Supreme Court justice votes in favor of the petitioner of a case (i.e, when petitioner vote) and why is it policy relevant to be able to predict this event? (2pts)

2. In general, non-technical terms, describe how were the variables pitch_diff and petitioner harvard pos generated. What types of classification errors exist for the variable petitioner_harvard_pos? Use the included article and supplemental appendix as a reference but be very careful to write this in your own words. (1-2 paragraphs; 5pts)

3. To begin familiarizing with these data, generate two figures — or one figure with subpanels — that show the proportion of votes in favor of the petitioner by each of the three Chief Justices included (Warren E. Burger, William Rehnquist, and John Roberts) comparing the cases when the Solicitor General submits an amicus supporting the petitioner and when the SG does not. Make sure to label your axes clearly, and write a concise caption for the figures summarizing what you see. (Figure(s) with captions; 4pts)

4. Create a new categorical variable that identifies the three periods of the court covered by the data, as defined by the tenure of each of the three chief justices. (You can find these periods here: https://www.supremecourt.gov/about/members text.aspx.) Note that terms begin in October and last until September in the next year, and refer to the year in which they begin; e.g., the 1986 term lasts from 10/1986 to 9/1987). For each of the three court periods, generate a figure that presents the proportion of votes, by all justices, in favor of the petitioner when pitch_diff is above (or below) its average. (Figure(s) with captions; 4pts)

3. Regression Analysis (30pts)

You will now systematically explore the predictiveness of the justices’ pitch and word choice during oral ar-guments on their eventual vote.

1. Create a new variable (pr petitioner pos) that measures the proportion of total words directed at a pe-titioner that are “positive” minus the proportion of total words directed at a respondent that are “posi-tive,” based on the Harvard dictionary (i.e., petitioner harvard pos and respondent_harvard_pos). Then regress petitioner vote on pr_petitioner_pos and pitch_diff, the average vocal pitch in questions directed to the petitioner minus the average vocal pitch in questions directed to the respondent, and interpret the results substantively. (2pts)

2. Why can’t you simply use justicename in a regression? Explain how you could fix this problem if you were to estimate a regression that includes justicename as an independent variable in R (don’t estimate any regression yet.) (1-2 sentences; 3pts)

3. Add indicator variables for the all the justices to the regression in question 1 above. Interpret the coef-ficients on the justices’ indicators (think carefully about what it means when these are zero). What do you learn about the different justices? How does inclusion of all these variables impact your coefficient for pr_petitioner_pos and pitch_diff ? (5pts)

4. Estimate a regression similar to that in question 3, but also add term-specific indicators. How does your adjusted R 2 changes between the model in question 1 and this one? Do these statistics provide a good reason to prefer one model over the other as you consider the role of voice pitch on the likelihood of a favorable vote to the petitioner? (3pts)

5. Regress petitioner vote on the interaction of pitch_diff with each of the three court periods, as defined in question 2.4. Provide a few sentences about the theory of what you are testing here, and compare it to the theory that underlies question 1 above. Then provide a professional regression table of these two regressions (the estimations from question 3.1 and 3.5), and one figure that explains what the interacted regression is telling you. In your visualization, choose a range for pitch_diff that makes sense, given the distribution of this variable in the data. (5pts)

6. Estimate 6 regressions, with petitioner vote as the dependent variable:

• A bivariate linear regression on pitch_diff

• A regression on pitch_diff and pr_petitioner_pos

• A regression on pitch_diff, pr_petitioner_pos, and an indicator for whether the Solicitor General submitted an amicus supporting the petitioner

• A regression on pitch_diff, pr_petitioner_pos, a SG amicus indicator, and court period indicators (as defined in question 2.4)

• A regression on pr_petitioner_pos and the full interaction of the SG amicus indicator and court period indicators (as defined in question 2.4)

• A regression on court period, the SG amicus indicator and the full interaction between pitch_diff and pr petitioner_pos

Explain what you learn from this set of regressions, with emphasis on the progression of your un-derstanding from the simple model. What do you conclude? Provide some concise text (a few short paragraphs maximum), a professional table, and two supporting figures that help to explain the inter-actions. As before, choose a range for pitch_diff that makes sense, given the distribution of this variable in the data. (17pts: 7 for text, 4 for table, 6 for figures)

4. Outlier Analysis and Threats to Validity (35pts)

1. Conduct a thorough analysis of potential outliers on the last model in 3.6. (This should include an analysis that makes use of outlier statistics: studentized residuals, leverage, Cook’s distance, DFFIT.) Present your results clearly and concisely, using a small table and a figure. If there are especially trou-bling data points, explain what they are and explain how including or omitting them changes your results. (15pts)

2. Assume that our measure of differential voice pitch (a measure generated by the research) is related to the true difference in voice pitch in the following way:

pitch_diffi = T rue pitch_diffi + ui ,

where ui is some error that has mean zero and is independent of both the true pitch differential and the measure created for this research. How might this affect your estimates in 3.6? If ui had a non-zero mean, what would that tell us about the sound-detection algorithm? How would it affect your estimates in 3.6? (5pts)

3. Give two additional examples of measurement error and how they might be affecting your results in 3.6. For each example, make sure to name the type of measurement error, give a very brief explanation of how it might arise, and explain how you think it might affect your results. (10pts)

4. Finally, what are some other limitations to your analysis? Are there omitted variables or other problems with the research as conducted? Think in particular about the initial question — is voice pitch really predictive of eventual justices’ votes? How would you make this analysis better? (5pts)