闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STATS221-22A Statistical Data Analysis

Assignment One

Important Notes:

This assignment is to be solved using the R Studio statistical package. Type your answers in a word document, paste the R Studio outputs/figures into this document and print this document for your submission.

Make sure that you download ‘STATS221-22A Assignment Cover Page.pdf’ from Moodle, print it, write clearly your name and student ID in the space provided and use it as a cover-page for your assignment submission.

Submit your assignment by dropping it in the STATS221 box located outside the main reception at the FG link – ground floor.

Please ensure that you clearly identify which question and task each of your answers relates to.

Your attention is drawn to the policies regarding plagiarism and late submission which are described in the course outline available on Moodle.

Maximum marks for each question are indicated in square brackets.

Question 1: [17 Marks]

It is well known that many people try to improve their chances of winning a lottery by using lucky numbers or buying their tickets from certain lucky stores. In order to estimate what proportion of population would buy from a lucky store, a question was asked in the class survey. Participants were asked to respond Yes/No to the following question: “If you learned that a certain store has sold the winning lotto ticket a few times in the past year, would you be tempted to buy a ticket from this store?”

In the survey, 12 out of 43 respondents said ‘Yes’ . These responses are also stored in an Excel file luckylotto.CSV on Moodle.

Task 1:

a. Based on this data, what will be your estimate for the proportion of people in the whole population that would consider buying their tickets this way?

[3 Marks]

b. Explain why this estimate is considered to be ‘statistically’ sound? [4 Marks]

Task 2:

a. Find the confidence intervals for your estimate at the 95% and the 99% levels and

interpret each.

b. Explain why the 99% confidence interval is wider than the 95% one?

[6 Marks] [4 Marks]

Question 2: [25 Marks]

For this question, we will use the data in the Excel worksheet Treesize.CSV. A copy of this can be found on Moodle. This data was collected in the US state of Georgia to test if the tree species growth is superior in a warmer climate compared to a cooler one . The data comes from two regions – north (n) and south (s). The northern region is elevated and hence hosts a much cooler climate compared to the southern region. 30 pine trees were randomly selected from each region. Sizes were determined by measuring the diameter at breast height (DBH) for each tree in the sample. The data contains the following variables:

Variable	Description	Variable Type
ns	Region – n or s	Categorical
dbh	Diameter at breast height (DBH) measurements	Numeric

Task 1:

a. Produce a descriptive summary of the two groups of data . Which statistic did you include in this summary? Why? Paste your output. [2 Marks]

b. Use plot/s to examine the two groups of data graphically . Which plot/s did you choose? Why? Paste your output. [2 Marks]

c. Based on the descriptive summary and the plots that you have produced, describe the patterns in the data. [3 Marks]

Task 2:

What are the appropriate null and alternative hypotheses for comparing the two groups

of data? Justify your choice.

Task 3:

a. Explain why it is appropriate to use a two sample t-test on this data?

[4 Marks]

[2 Marks]

b. How would you decide which version of the two sample t-test is most appropriate for this data? Describe the process you would follow in making this decision, then perform the required tasks in that process and paste the outputs of those tasks in your submission. [5 Marks]

c. So based on the results of ‘b’ above, which version of the t-test is most appropriate? Why? [2 Marks]

Task 4:

a. Perform the two sample t-test (the version you have chosen in Task 3) to test the hypotheses you described in Task 2 . Paste your output. [2 Mark]

b. What do you conclude at 5% level? Report your finding in the context of the question posed. [3 Marks]

Question 3: [14 Marks]

Vehicles have built-in computers that calculate various quantities related to performance. One of this is the fuel efficiency. A car manufacturer wants to test the accuracy of these measurements. To do this, they arrange a number of test-runs where in addition to the computer calculating the miles per gallon (mpg), the driver also recorded the mpg by dividing the miles driven by the amount of gallons at fill-up. They want to determine if the two readings agree or not. For this question, we will use the data stored in a Excel

worksheet MPG_Comparison.CSV. A copy of this can be found on Moodle. The data contains the following variables:

Variable	Description	Variable Type
Test-run	Test run #	Numeric
Computer	mpg reading recorded by the computer	Numeric
Driver	mpg reading recorded by the driver	Numeric
Diff	Difference = Computer reading – driver reading	Numeric

Task 1:

What are the appropriate null and alternative hypotheses for this data? Justify your

choice.

Task 2:

Which test is the most appropriate for this data? Explain why.

[4 Marks]

[5 Marks]

Task 3:

a. Perform the test you chose in Task 2 to test the hypotheses you described in Task

1. Paste your output. [2 Mark]

b. What do you conclude at 1% level? Report your finding in the context of the question posed. [3 Marks]

Question 4: [14 Marks]

For this question, we will use the data stored in the Excel worksheet SexPartners.CSV. A copy of this can be found on Moodle. The data contains information obtained from the

STAT121 students on the number of ‘sexual partners’ they have had. This data was collected many years back. The data contains:

Variable	Description	Variable Type
Gender	Gender: F or M	Categorical
No partner	Count of the students with no sexual partners	Numeric
1 partner	Count of the students with one sexual partner	Numeric
2 partners	Count of the students with two sexual partners	Numeric
3 partners	Count of the students with three sexual partners	Numeric
4 partners	Count of the students with four sexual partners	Numeric
5 partners	Count of the students with five sexual partners	Numeric