Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ECON 250: Introductory Statistics

Fall 2022

Stata  Assignment 2

Question 1: Central Limit Theorem [25 marks]

In this question you will use Stata to demonstrate that  the Central Limit Theorem  holds by simulation. You will generate random samples of a random variable that has a uniform distribution, U (0, 1).

In the Simulation.do file provided, I have written a Stata command named ”samplemeans”. It generates a sample of random draws from the standard uniform distribution, U (0, 1), and calculates and outputs the sample mean. The syntax is ”samplemeans [sample size]”.

a) Using the ”samplemeans” command provided and the ”simulate” command in Stata, explain how you could obtain the sampling distribution of the sample mean. Carry out this procedure for the sample mean with a sample size of 10. Provide a graph of the sampling distribution. [10 marks]

b) Explain how you could repeat your procedure in part a), with some  modifications,  to  verify whether the Central Limit Theorem holds. [5 marks]

c) Carry out your proposed procedure in part b). Does this suggest that the Central Limit Theorem holds? [10 marks]

Question 2: 7.94 Giannis Antetokounmpo rebounds [30 marks]

Sports analytics have  permeated nearly all sports at all levels. Predictive analytic meth-  ods are used to study and monitor a variety of statistics on teams and players.  Consider the provided data on the number of rebounds that NBA superstar Giannis Antetokounmpo made for each of the 75 consecutive 2017–2018 regular season games that he played in. An analysis of the consecutive rebounds would show them to behave randomly over time with no patterns. This data is provided in the file ”ex07-94giannis.csv”.

a) Obtain a histogram and Normal quantile plot of the rebounds data. Even though techni- cally the distribution cannot exactly be Normal due to the discreteness of the data values, do the plots suggest that Normality is a reasonable approximation? Explain. [7.5 marks]

b) Use software to find the values of y¯ and the sample standard deviation s. [2.5 marks]

c) Assuming that Giannis’s rebounding process continues  in  the  same  manner  and  using the standard deviation value  found in part (b) as σ, what is a 90% prediction interval for        the number of rebounds in his next game? Report the prediction interval as computed in decimal form. [5 marks]

d) In general, when dealing with predicting integer outcomes, as with rebounds, how would  you suggest converting the lower and upper limit values to integers so  that  the  resulting interval is minimally of the specified C level?  Be careful when thinking about the conversion  of a lower limit value versus an upper limit value. Explain. [5 marks]

e) Following your approach of part (d), what is  the  conservative  90%  prediction  interval with integer endpoints for the prediction interval found in part (c)? [5 marks]

f) If you were to expand  your  predictive  model  to  narrow  the  prediction  interval  width, list at least two factors or pieces of information that you would gather. [5 marks]

Question 3: 8.106 Two-sample t test versus matched pairs t test [20 marks]

Consider the dataset named ”ex08-106paired.csv”. It contains data collected on two groups of individuals. The data were actually collected in pairs, and each row represents a pair.

a) Suppose that we ignore the fact that the data were collected in pairs and mistakenly treat this as a two-sample problem. Compute the sample mean and variance for each group. Then compute the two-sample t statistic, degrees of freedom, and P-value for the two-sided alternative. Give a conclusion. [7.5 marks]

b) Now  analyze the data in the proper way.   Compute the sample mean and variance of        the differences. Then compute the t statistic, degrees of freedom, and P-value. Give a con- clusion. [7.5 marks]

c) Describe the differences in the two test results. [5 marks]

Question 4: Grading Scheme [25 marks]

A professor wants to evaluate whether testing students weekly with a quiz improves their learning of the class material. The professor teaches the same course in two different aca- demic years, with a class size of 30 in both. In the first year, the students completed a quiz each week. In the second year, the students were not given any quizzes. Assume that the teaching was otherwise identical in each semester. The course percentage grade for each student is given in the dataset titled ”Test Scores.csv”.

a) Analyze whether the class that was  given the weekly quizzes scored higher on average        in the class.  Give a 95% confidence interval  as well as an appropriate test result.  Be sure        to provide evidence on the validity of each of the two inference methods for this data. [20 marks]

b) Suppose that the professor taught the course to two  different classes in one semester,  rather than in two  different years.  Ethics aside,  would this be a better or worse study de-  sign? Why? [5 marks]