闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ASSIGNMENT

Semester 1, 2023

STAT7055 Introductory Statistics for Business and Finance

INSTRUCTIONS TO STUDENTS

Due Date

• The assignment is due at 6:00pm on Thursday May 18.

• Late submission of the assignment is not permitted. An assignment submitted without an extension after the due date will receive a mark of 0.

Obtaining your Assignment

• There are diﬀerent versions of the assignment and each student will be assigned a particular version of the assignment.

• Therefore, you must log in to Wattle with your own ANU credentials and download your assignment questions and data directly from Wattle.

Writing your Assignment

• The assignment is an individual piece of assessment and must be completed on your own.

• You are not permitted to use any form of tutoring services (e.g., online, in-person, etc.) or any AI tools (e.g., ChatGPT, etc.).

• You will be required to write a report in an R Markdown document that contains R code (with R code comments), R output and written text. An example of an R Markdown document, which you can use as a template, has been provided on Wattle.

• All R code must have accompanying R code comments that brieﬂy describe what the code is doing.

• When answering the assignment questions in your report, you will need to include all your R code and R output that you used to calculate any answers and you must also write your answers in proper sentences. For example, if you are required to calculate a sample mean, then you would include your R code for calculating the sample mean and the R output of the sample mean value and you would also write a proper sentence in the report such as “The sample mean is equal to ...” .

• Make sure to be clear and concise in your answers.

• A good way to approach writing your report is to imagine that you are a statistical consultant and that a client has asked you to do some statistical analyses . When presenting the results of your analyses to the client, you wouldn’t just give them pages

of R code or pages of R output. Rather, you should give them a proper report which clearly outlines and explains the results of the analyses and which also includes the R code and R output used to produce the results.

• Therefore, presentation is very important. Marks will be deducted for poorly presented reports.

• Once you have ﬁnished writing your report in your R Markdown document, you will need to render the document by pressing the Knit button in RStudio to create a HTML ﬁle of your report.

• Further to the above point, it is good practice to regularly Knit your R Markdown document as you write your report. This is useful for checking that it’s rendering properly.

Submitting your Assignment

• Submission of the assignment will be through Wattle and further details regarding assignment submission will be provided on Wattle.

• For submission you will need to submit two ﬁles: the R Markdown ﬁle of your report (i.e., a “ .Rmd” ﬁle) and the rendered HTML ﬁle of your report produced by pressing the Knit button in RStudio (i.e., a “ .html” ﬁle).

• Please name your two ﬁles as “uNNNNNNN.Rmd” and “uNNNNNNN.html” , where uNNNNNNN is your student number.

• No other ﬁle types will be accepted or marked, e.g., “ .R” , “ .docx” , “ .RData” , “ .zip” , etc. In particular, do not submit any compressed ﬁles.

Other Important Details

• You may only use built-in functions available in the default installation of R and you are not permitted to use functions in any additional R packages (e.g., ggplot2).

• You must use the appropriate R functions (and not the statistical tables) to calculate critical values or p-values for the normal, t and F distributions.

• You must use R for all calculations.

• Round all ﬁnal numeric answers to 4 decimal places. However, as you will be using R, keep all decimals during all intermediate steps to ensure the accuracy of your ﬁnal numeric answer.

• Please use the help function if you want to learn more about a particular R function, e.g., enter help(mean) in the R console to learn more about the mean function.

• For questions that require writing mathematical symbols, you are welcome to use short- hand notation, provided you make the meaning clear (e.g., using “Mu” for µ , or “!=” for ).

• Answers (including hypotheses, explanations, conclusions, etc.) need to be written in the text of the R Markdown document and not in the R code comments.

• Do not print out the entire data sets in your R Markdown document or HTML ﬁle, as this will only take up unnecessary space.

Question 1 [22 marks]

A research institute has hired you to perform some statistical analyses and provide sta- tistical advice for various studies that they are conducting. The ﬁrst study is one that aims to better understand a person’s ﬁtness level and logical reasoning ability. As part of this study, the institute has developed two tests. The ﬁrst is a ﬁtness test that involves recording the time it takes a person to complete a series of endurance exercises and the second is a written test that is designed to measure a person’s logical reasoning ability. A random sample of 400 people was selected and the following were recorded for each person: the time in seconds it took them to complete the ﬁtness test (Time), their score on the logical reasoning test (Score), their age in years (Age), their IQ (IQ) and their weight in kilograms (Weight). The data are stored in the ﬁle AssignmentData.RData in the data frame Q1.df.

(a) [6 marks] The institute would like detailed information regarding the overall dis-

tribution of the ﬁtness test times for everyone in the sample. However, they also want information about the distribution of ﬁtness test times for people who are 45 years or younger, and for people who are over 45 years old. Based on what was covered in the course, create some appropriate plots that will help the institute with their requests. Make sure to give all your plots proper descriptive titles and appropriate labels for the axes (do not just use the default titles or labels). Provide clear descriptions of the distribution of the ﬁtness test times for the various scenar- ios. Be speciﬁc in your descriptions, making sure to mention any interesting and/or important aspects of the distributions.

(b) [4 marks] The institute believes that a person’s age, IQ and weight are likely to

be useful variables in predicting their ﬁtness test time and logical reasoning test score. The institute also theorises that variables which have more variability are more important for prediction purposes. Based on the institutes theory, order the variables age, IQ and weight in terms of importance (from most important to least important), making sure to provide clear justiﬁcations for your ordering.

(c) [2 marks] Based on the deﬁnition given in the course, calculate the diﬀerence be- tween the 83rd percentile and the 15th percentile of people’s weights in the sample.

(d) [4 marks] The institute has categorised people into six levels (level 1 through to level 6) based on their age and IQ. Speciﬁcally, anyone younger than 39.55 years with an IQ less than 102.15 is categorised as level 1, anyone younger than 39.55 years with an IQ greater than 102.15 is categorised as level 2, anyone between 39.55 and

51.25 years old with an IQ less than 102.15 is categorised as level 3, anyone between

39.55 and 51.25 years old with an IQ greater than 102.15 is categorised as level 4, anyone older than 51.25 years with an IQ less than 102.15 is categorised as level 5 and anyone older than 51.25 years with an IQ greater than 102.15 is categorised as level 6. Based on what was covered in the course, create the most appropriate plot for describing the categorisation of the people in the sample into these six levels. Make sure to give your plot a proper descriptive title and appropriate labels for the axes (do not just use the default title or labels). Determine the second most frequently occurring level.

(e) [3 marks] Test whether the population mean logical reasoning test score for people

in levels 1 or 2, as deﬁned in part (d), is less than 152. Clearly state your hypotheses, making sure to deﬁne any parameters, and use a signiﬁcance level of α = 3%. Do not use any R functions that are designed to perform hypothesis tests.

(f) [3 marks] Among people in levels 5 or 6, as deﬁned in part (d), test whether the

population proportion of people whose ﬁtness test times are longer than 229.75 seconds is greater than 0.59. Clearly state your hypotheses, making sure to deﬁne any parameters, and use a signiﬁcance level of α = 1%. Do not use any R functions that are designed to perform hypothesis tests.

Question 2 [19 marks]

The second study the research institute is conducting is one that investigates the food spending habits of people who live in one of four suburbs (Allentown, Bridgeport, Charleston and Davenport) and who shop at one of three stores (the Everyday store, the Farmers store and the Grocery store). For each combination of suburb and store, a sample of 30 people who live in that suburb and shop only at that store was randomly selected. For each person, the following were recorded: the suburb in which they live (Suburb), the store at which they shop (Store), their weekly food expenditure in January in dollars (ExpJan) and their weekly food expenditure in February in dollars (ExpFeb). The data are stored in the ﬁle AssignmentData.RData in the data frame Q2.df.

(a) [7 marks] Test whether the population mean weekly food expenditure in January

for people in Bridgeport is greater than the weekly food expenditure in January for people in Allentown by more than 1.5 dollars. Based on what was covered in the course, make sure to formally test any assumptions that can be tested. Clearly state all hypotheses that you test, making sure to deﬁne any parameters, and use a signiﬁcance level of α = 5% for any tests that you perform. Do not use any R functions that are designed to perform hypothesis tests.

(b) [4 marks] For people who only shop at the Farmers store, test whether the pop-

ulation mean weekly food expenditure in February is greater than the population mean weekly food expenditure in January by more than 3 dollars. Based on what was covered in the course, make sure to formally test any assumptions that can be tested. Clearly state all hypotheses that you test, making sure to deﬁne any parameters, and use a signiﬁcance level of α = 2.5% for any tests that you perform. Do not use any R functions that are designed to perform hypothesis tests.

(c) [4 marks] Perform a one-way ANOVA to test whether the population mean weekly food expenditure in January is the same across all four suburbs. You can assume that the assumptions for a one-way ANOVA are satisﬁed. Clearly state your hypotheses, making sure to deﬁne any parameters, and use a signiﬁcance level of α = 10%. Do not use any R functions that are designed to perform hypothesis tests or to perform, analyse or interpret an ANOVA.

(d) [4 marks] The institute is considering performing a two-way ANOVA on the data with weekly food expenditure in January as the response variable and suburb and store as the two factors. Without performing this two-way ANOVA and only using values from any F-distribution and information available in any one-way ANOVA performed on this data (with weekly food expenditure in January as the response variable), determine the value that the interaction sum of squares (in this two-way ANOVA) needs to be greater than in order to conclude that there is a signiﬁcant interaction between suburb and store at a signiﬁcance level of α = 5%. Make sure to clearly explain all your steps in deriving your answer. Do not perform a two- way ANOVA when answering this question and do not use any R functions that are designed to perform hypothesis tests or to perform, analyse or interpret an ANOVA.

Question 3 [25 marks]

The third study the research institute is conducting investigates how the type of light source and the amount of fertiliser might aﬀect the full height of a particular species of ﬂower. A sample of 200 ﬂower seeds was randomly selected and each seed was grown using a certain amount of fertiliser under either natural light or artiﬁcial light. The following were recorded for each seed: the amount of fertiliser that was given in grams (Fertiliser), the light source that the seed was grown under (Light) and the full height in centimetres of the resulting ﬂower (Height). The data are stored in the ﬁle AssignmentData.RData in the data frame Q3.df.

(a) [5 marks] Create a scatter plot of ﬂower height against fertiliser amount. In the

scatter plot, colour all points corresponding to seeds that were grown under natural light in blue and colour all points corresponding to seeds that were grown under artiﬁcial light in red. Make sure to give your plot a proper descriptive title and appropriate labels for the x and y axes. Describe the overall relationship between these two variables for all seeds and also describe the relationship between the two variables for seeds grown under natural light, and for seeds grown under artiﬁcial light.

(b) [3 marks] Considering only seeds grown under natural light, test whether or not

the correlation between ﬂower height and fertiliser amount is equal to zero. Clearly state your hypotheses, making sure to deﬁne any parameters, and use a signiﬁcance level of α = 1%. Do not use any R functions that are designed to perform hypothesis tests.

For the remaining parts of this question, do not use the lm function or any other R function designed to ﬁt, analyse or interpret regression models.

(c) [4 marks] Considering only seeds grown under artiﬁcial light, ﬁt a simple linear regression model with ﬂower height as the dependent variable and fertiliser amount as the independent variable without using the lm function or any other R function designed to ﬁt, analyse or interpret regression models. Write down the estimated regression model.

(d) [5 marks] Discuss whether the assumptions for a simple linear regression model hold for the model you ﬁtted in part (c), making sure to provide clear justiﬁcations for your answer.

(e) [4 marks] For the model you ﬁtted in part (c), test whether the intercept is greater

than 130. Clearly state your hypotheses and use a signiﬁcance level of α = 10%. Do not use any R functions that are designed to perform hypothesis tests.

(f) [4 marks] Using the model ﬁtted in part (c), calculate an 80% prediction interval

for the ﬂower height of a seed grown under artiﬁcial light that was given 15 grams of fertiliser. The institute is interested in ways they could increase the accuracy of this 80% prediction interval. Based on what was covered in the course, advise the institute on any steps they can take in the future to increase the accuracy of an 80% prediction interval for the ﬂower height of a seed grown under artiﬁcial light that was given 15 grams of fertiliser. Do not use any R functions that are designed to calculate any predictions, conﬁdence intervals or predictions intervals.

Presentation [4 marks]

Marks will be allocated for how well presented your report is, e.g., clear and distinct head- ings, concise answers with information clearly communicated, all R code appropriately commented, etc.

2023-05-14

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言