Statistics 2120: Introduction to Statistical Analysis Homework 9
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Statistics 2120: Introduction to Statistical Analysis
Homework 9
Instructions:
Be sure to provide your full name and computing ID at the top of your work.
Write out the Honor Pledge under your name and computing ID: “On my honor, I did not give nor receive aid on this assignment beyond the listed collaboration.”
List the names of students with whom you collaborated under the Honor Pledge. If you did not collaborate, write ‘None’ .
Respond to each problem below thoroughly, showing all relevant work.
Use Python for all calculations. Include a screen shot showing relevant code and output for each part using Python.
Save your completed work as a PDF and upload it to Gradescope. Be sure to select the appropriate page(s) for each answer. Unselected work will not be graded.
Problems:
1. In July 2018, a UVAToday article headline read “Only Half of Americans Believe Elections are Fair and Open”. The following statement can be found in the article: “In the Ipsos poll, which surveyed more than 1,000 American adults on July 5 and 6, 51 percent of the respondents agreed with the statement that American elections are fair and open.” The poll was based on a sample of 1,006 adults.
For the following questions, refer only to the information provided in this paragraph.
A. Please clearly denote your answer to each of the following questions:
Which inference procedure is appropriate to test if more than half of Americans agree that Amer- ican elections are open and fair? Explain.
What is the parameter of interest for this test?
What are the appropriate hypotheses for this test?
Conduct the appropriate test with 10% significance.
Is the conclusion made from this test reliable? Explain.
B. Please clearly denote your answer to each of the following questions:
Construct a 90% confidence interval to estimate the proportion of Americans who agree that American elections are open and fair.
Is this interval reliable? Explain.
Does this interval confirm your conclusion?
What is the margin of error of the interval?
2. The file social media.csv contains data from a survey conducted by the Pew Research Center to in- vestigate social media use in the United States in 2018. Sample respondents (who are representive of the United States population) were asked whether their age was over or under 50 years and whether it would be hard for them to give up social media.
A. Please clearly denote your answer to each of the following questions:
Which inference procedure is appropriate to test if there is an association between age and enjoy- ment of social media use? Explain.
What are the appropriate hypotheses for the test?
Conduct the appropriate test. Display intermediate components of the process. Is the conclusion made from this test reliable? Explain.
Explain why a second inference procedure is appropriate to test if age influences enjoyment of social media use.
B. Please clearly denote your answer to each of the following questions:
What are the appropriate hypotheses for this second test?
Conduct the second appropriate test.
Is the conclusion made from the second test reliable? Explain.
Construct a 95% confidence interval to estimate the di↵erence in enjoyment of social media be- tween the over and under 50 U.S. populations.
Is the interval for this second test reliable? Explain.
C. Please clearly denote your answer to each of the following questions: What components of the two tests are the same?
Do the conclusions made from these tests agree?
D. Would both procedures be appropriate to test if the U.S. population under 50 enjoys social media more? Explain.
3. Have you ever wondered why the package of M&Ms you just bought never seems to have enough of your favorite color? According to M&Ms, the distribution of colors for their milk chocolate candies in 2008 was: 24% blue, 13% brown, 16% green, 20% orange, 13% red, and 14% yellow. The statistician described in the article obtained two scoops of M&Ms from the breakroom at the office every week, until he had 712 candies. The observed number of candies for each of the colors in this sample collected in 2016-2017 are: 133 blue, 96 brown, 139 green, 133 orange, 108 red, and 103 yellow.
A. Which inference procedure is appropriate to test if the M&Ms color distribution changed from 2008 to 2017? Explain.
B. What are the appropriate hypotheses for the test described in part A.?
C. Conduct the appropriate test. Display intermediate components of the process. D. Is the conclusion made from the test in part C. reliable? Explain.
HW9
Jessica Xiong (pqf6rd)
On my honor, I did not give nor receive aid on this assignment beyond the listed collaboration.
Problems:
1. In July 2018, a UVAToday article headline read “Only Half of Americans Believe Elections are Fair and Open”. The following statement can be found in the article: “In the Ipsos poll, which surveyed more than 1,000 American adults on July 5 and 6, 51 percent of the respondents agreed with the statement that American elections are fair and open.” The poll was based on a sample of 1,006 adults.
For the following questions, refer only to the information provided in this paragraph.
A. Please clearly denote your answer to each of the following questions:
Which inference procedure is appropriate to test if more than half of Americans agree that American elections are open and fair? Explain.
For this study, one sample z procedure is appropriate to test if more than half of Americans agree that American elections are open and fair. This is because the variable of interest is one binary variable, which is whether American elections are open and fair. It is also a sample from the population of American adults. And there is only one population.
What is the parameter of interest for this test?
The parameter of interest is whether American adults believe elections are open and fair.
What are the appropriate hypotheses for this test?
The null hypothesis is that exactly half of the American adults believe elections are open and fair. H0: P = 0.5
The alternative hypothesis is that more than half of the American adults believe elections are open and fair. Ha: P > 0.5
Conduct the appropriate test with 10% significance.
By computing the one sample z test with a 10% significance, we can get the test statistics 0.6344 and the p-value 0.2629. Since p-value is larger than the significance level 0.1, we reject the alternative hypothesis and fail to reject the null hypothesis. The conclusion is that there is insufficient evidence to support that more than half of the American adults believe elections are open and fair.
Is the conclusion made from this test reliable? Explain.
Yes, the conclusion made from this test is reliable because 0 = 1006 * 0. 5 = 503 > 10 and (1 − 0) = 1006 * 0. 5 = 503 > 10. Since both np0 and n(1-p0) is larger than 10, we can suggest that this test is reliable.
B. Please clearly denote your answer to each of the following questions:
Construct a 90% confidence interval to estimate the proportion of Americans who agree that American elections are open and fair.
The 90% confidence interval to estimate the proportion of Americans who agree that American elections are open and fair is (0.484, 0.536) .
Is this interval reliable? Explain.
Yes, this interval is reliable because = 1006 * 0. 51 = 513. 06 > 10 and (1 − ) = 1006 * 0. 49 = 492. 94 > 10. Since both np0 and n(1-p0) is larger than 10, we can suggest that this interval is reliable.
Does this interval confirm your conclusion?
Yes, this interval confirms my conclusion because 0.5 is included in the interval. Therefore we can reject the alternative hypothesis and fail to reject the null hypothesis, and there is insufficient evidence to support that more than half of the American adults believe elections are open and fair.
What is the margin of error of the interval?
The margin of error of the interval is 0.0259.
2. The file social media.csv contains data from a survey conducted by the Pew Research Center to in- vestigate social media use in the United States in 2018. Sample respondents (who are representive of the United States population) were asked whether their age was over or under 50 years and whether it would be hard for them to give up social media.
A. Please clearly denote your answer to each of the following questions:
Which inference procedure is appropriate to test if there is an association between age and enjoy- ment of social media use? Explain.
In this case, a chi-square test for two-way tables is appropriate to test if there is an association between age and enjoyment of social media use because we are examining whether there is an association between the two categorical variables, age group (over or under 50 years) and whether it would be hard for them to give up social media, of a two-way table.
What are the appropriate hypotheses for the test?
P1: the proportion of respondents who are over 50 and feel it would be hard for them to give up social media
P2: the proportion of respondents who are under 50 and feel it would be hard for them to give up social media.
The null hypothesis is there is no association between age groups and enjoyment of social media use.
H0: P1 = P2
The alternative hypothesis is that there is an association between age groups and enjoyment of social media use.
Ha: P1 ≠ P2
Conduct the appropriate test. Display intermediate components of the process.
After conducting the chi square test, we can get the test statistics 7.9718 and the p-value 0.0048. Since p-value is smaller than the assumed significance level 0.05, we reject the null hypothesis and support the alternative hypothesis. The conclusion is that there is sufficient evidence to support that there is an association between age groups and enjoyment of social media use.
Is the conclusion made from this test reliable? Explain.
The conclusion made from this test is reliable because /rc = 1953/4 = 488. 25 > 5 and every expected count is at least 5, so the approximation to the chi-square distribution is valid.
Explain why a second inference procedure is appropriate to test if age influences enjoyment of social media use.
In this case, two-sample z procedure is also appropriate to test if age influences enjoyment of social media use because there are two categorical (binary) variables, age group (over or under 50 years) and whether it would be hard for them to give up social media.
B. Please clearly denote your answer to each of the following questions:
What are the appropriate hypotheses for this second test?
P1: the proportion of respondents who are over 50 and feel it would be hard for them to give up social media
P2: the proportion of respondents who are under 50 and feel it would be hard for them to give up social media.
The null hypothesis is there is no difference between the proportion of respondents who are over 50 and feel it would be hard for them to give up social media and the proportion of respondents who are under 50 and feel it would be hard for them to give up social media.
H0: P1 − P2 = 0
The alternative hypothesis is there is a difference between the proportion of respondents who are over 50 and feel it would be hard for them to give up social media and the proportion of respondents who are under 50 and feel it would be hard for them to give up social media.
Ha: P1 − P2 ≠ 0
Conduct the second appropriate test.
From the 2 sample z test, we can get the test statistics -2.8233 and the p-value 0.0048. Since p- value is smaller than the assumed significance level 0.05, we reject the null hypothesis and support the alternative hypothesis. Therefore we can conclude that there is sufficient evidence to support the difference between the proportion of respondents who are over 50 and feel it would be hard for them to give up social media and the proportion of respondents who are under 50 and feel it would be hard for them to give up social media. There is an association between age and enjoyment of social media use.
Is the conclusion made from the second test reliable? Explain.
Yes, the conclusion made from the second test is reliable because the number of successes and the number of failures in both samples are at least 10. The assumption of the normal approximation is generally valid.
Construct a 95% confidence interval to estimate the difference in enjoyment of social media be- tween the over and under 50 U.S. populations.
The 95% confidence interval to estimate the difference in enjoyment of social media be- tween the over and under 50 U.S. populations is (-0.107, -0.019).
Is the interval for this second test reliable? Explain.
Yes, the conclusion made from the second test is reliable because the number of successes and the number of failures in both samples are at least 10. The assumption of the normal approximation is generally valid.
C. Please clearly denote your answer to each of the following questions: What components of the two tests are the same?
The p-values, which are both 0.0048.
Do the conclusions made from these tests agree?
The conclusions made from these tests agree. Both tests conclude that there is an association between age and enjoyment of social media use.
D. Would both procedures be appropriate to test if the U.S. population under 50 enjoys social media more? Explain.
No, only the two-sample z procedure would be appropriate to test if the U.S. population under 50 enjoys social media more because the chi-square procedure only tests if there is an association between two categorical variables, and the test statistic is always positive. We do not know whether it is the U.S. population under 50 who enjoys social media more or the U.S. population over 50.
3. Have you ever wondered why the package of M&Ms you just bought never seems to have enough of your favorite color? According to M&Ms, the distribution of colors for their milk chocolate candies in 2008 was: 24% blue, 13% brown, 16% green, 20% orange, 13% red, and 14% yellow. The statistician described in the article obtained two scoops of M&Ms from the breakroom at the office every week, until he had 712 candies. The observed number of candies for each of the colors in this sample collected in 2016-2017 are: 133 blue, 96 brown, 139 green, 133 orange, 108 red, and 103 yellow.
A. Which inference procedure is appropriate to test if the M&Ms color distribution changed from 2008 to 2017? Explain.
A chi-square test of goodness offit would be appropriate because we are exploring whether the M&Ms color distribution changed from 2008 to 2017. There is one categorical variable, which is the color of M&M. And the sampling method is random sampling.
B. What are the appropriate hypotheses for the test described in part A.?
The null hypothesis is that all of the observed probabilities of each color of M&M in the statistician’s sample are consistent with the expected probabilities of each color of M&M.
H0 : P(blue) = 0. 24, P(bTown) = 0. 13, P(gTeen) = 0. 16 P(oTange) = 0. 2, P(Ted) = 0. 13, P(yellow) = 0. 14
The alternative hypothesis is that there is at least one probability in all of the observed
probabilities of each color of M&M in the statistician’s sample that is not consistent with its corresponding expected probabilities of each color of M&M.
Ha : at least one of the following statement is true:
P(blue) =/ 0. 24, P(bTown) =/ 0. 13, P(gTeen) =/ 0. 16
P(oTange) =/ 0. 2, P(Ted) =/ 0. 13, P(yellow) =/ 0. 14
C. Conduct the appropriate test. Display intermediate components of the process.
From the chi square test of goodness of fit, we can get the test statistics 17.35 and the p-value 0.0039. Since p-value is smaller than the assumed significance level 0.05, we reject the null hypothesis and support the alternative hypothesis. The conclusion is that there is sufficient evidence to support that the M&Ms color distribution changed from 2008 to 2017.
D. Is the conclusion made from the test in part C. reliable? Explain.
The conclusion made from the test in part C is reliable because every expected count is at least 5.
2023-08-15