STATS 201/8 Data Analysis Assignment 1, Summer Semester, 2023
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STATS 201/8 Data Analysis
Assignment 1, Summer Semester, 2023
Due: 3pm Tuesday 10th January
Instructions concerning this assignment:
|
A major purpose of this assignment is to ease you into the assignment procedures and the use of the statistical package R. We will be doing this through the use of RStudio and using R Markdown. We are providing you an R Markdown document called STATS20x_2023_SS_A1.rmd (available on Canvas) which will have some answers already filled in. You will need to fill in and complete the rest of the document. The data files you will be using for the assignment are described in the questions and are available from Canvas. Make sure you put these data files in the same folder you put the R Markdown document because it is going to look for them there. The first change you need to make to the markdown document is put your name and ID number at the top. |
Notes:
• This assignment is worth 3% of your final mark.
• Assignments must be submitted online to the Canvas dropbox PRIOR to the deadline. You will need to submit your knitted document as either a Word, HTML or PDF file. Assignments more than 30 minutes late are not accepted unless there is a good reason for an extension being granted (usually medical requiring a medical certificate). Canvas automatically closes submissions after the 30- minute grace period so you will be unable to submit after then.
• The total marks for this assignment will be 10.
• It is your responsibility to back up your computer files. If you are using your own computer, it is your responsibility to ensure that you can access the data and run R and RStudio well ahead of the assignment due date. Technical problems outside our control are not accepted as excuses for submitting coursework late.
• We encourage working together. Working together is discussing assignments with other students or getting help in understanding from staff and tutors. You must write up your final assignment individually, in your own words. We view cheating on assignment work seriously! Cheating includes: copying all or part of another student’s assignment or allowing another student to copy all or part of your assignment. A student who allows someone else to copy their work is treated identically to the student who did the copying. Penalties include: the student’s name will be entered on the university-wide Register of Academic Misconduct; loss of some or all marks for the assignment; the student(s) involved taken to the University Discipline Committee.
NOTE: This is a very short assignment compared to future assignments. The main goal is to get you started using R with RStudio and refresh your knowledge of simple linear regression.
Attending the Week 1 Workshops and completing the Linear Regression Quiz on Canvas will help you get started.
Even though this assignment is due in on the 4th day of lectures, you should aim to finish it earlier and immediately move on to assignment 2.
Question 1. [10 Marks]
A researcher was interested in determining the relationship between students' intelligence quotient (IQ) and their score on a language exam. In particular, what was the effect of IQ on exam score? A study was carried out where a large sample of students were recruited and the results of an IQ test and their score on a language exam were recorded.
For the purposes of this assignment, you will each be using a randomly generated data set based on
the relationship found from this study. Treat this as a random sample of 50 students. The data file created for you includes the following variables:
IQ the students IQ score
Lang the students score on the language test
Make sure you change your name and ID number at the top of the assignment.
Enter your name where requested at the start of the analysis. Your name is used to create a random
seed that generates your own unique set of data for the assignment.
Comment on the plot of the data.
A simple linear regression model has been fitted and output provided. Stick with the simple linear
regression model. (DO NOT CHANGE THE FITTED MODEL)
Create a scatter plot with the fitted line from the fitted model superimposed over it.
Complete the equation of the fitted model as part of the Method and Assumption Checks.
Complete the “% of the variability” statement as part of the Method and Assumption Checks.
Complete the Executive Summary by adding two more sentences. One sentence should be
interpreting the relevant strength of evidence in context and the other should be estimating the effect of a 10 unit increase in 工Q on the Lang.
Assignment Notes
For a lot of assignment questions, we will simply be giving you descriptions of how and why data was collected and minimal guidance. You will see how to analyse the data in the case studies in class, but here is a general approach to answering open data questions:
• Comment on the question(s) of interest or the goal(s) of the analysis .
• Look at the data (plot it, get summary statistics) and comment on it.
• Fit a model to the data
• Check the model assumptions.
• Change model and repeat checks as needed. You may have to do this more than once.
• Generate inference output from your final model.
• Write a Method and Assumption Checks section.
• This will detail the steps you took and why you took them in building the model.
• It will include brief descriptions of the model assumption checks.
• It will include a mathematical statement for the final model you fitted.
• It will include a comment on the percentage of variation explained (where appropriate).
• Write an Executive Summary .
Make sure you read the notes on the next page!

![]()
Some very important notes:
- When using case studies as guides, DO NOT blindly follow one case study. All data sets have their own individual attributes and are not likely to perfectly match a case study you find. Instead INTELLIGENTLY use the case studies to guide you.
- When commenting on plots, keep the comments brief and relevant. When commenting on assumptions, you do not need to go into great detail. If the plot shows no problems, it is ok to say that. If the plot shows problems, briefly describe the problems and then say what can be done about them.
- When writing Executive Summaries:
• We want the main conclusions in terms of the original questions asked.
• If there is a key question or goal for the data, make sure you answer it directly, then go into details as needed. For example, if a study is asking
if a tablet affected blood pressure you could have a sentence along the lines of “we have evidence that the tablet increased blood pressure”, then back this up with appropriate quantification (for example, giving a range of numbers for the increase in mean blood pressure after taking the tablet). Don’t leave us to infer the results from the quantification.
• Point out any unusual steps or changes made to your model in easy to understand terms.
• You should be using easy to understand, natural language. You should be avoiding using variable names, lots of decimal places and unnecessary detail.
• State units when known.
• This should be a brief, easy to read summary of your analysis that someone with little understanding of statistics can comprehend without having read through the analysis.
- If you want to check your Executive Summary, get a friend who hasn’t done statistics to read it and tell you what they think it means.
Obtaining a Copy of R and RStudio for Use at Home
Refer to the documentation on Canvas under:
Computer Workshop and R/RStudio Help -> R & RStudio Installation
Information
Equations
Equations take a little more practice. You will note we have provided you some of the equations for Assignment 1 to make your life easier .
On the following page we have provided some information on how you can change the formatting of your document such as adding headers, lists, bold, italics. It is all fairly easy with a little practice .
Some formatting options:
To start a new paragraph, end a line with two spaces.
|
*italics* produces italics
## Header ### Header |
**bold** produces bold Header Header Header |
…
###### Header produces Header
To make an unordered list:
* Item 1
* Item 2 + sub item 1 + sub item 2
To make an ordered list:
1. Item 1
2. Item 2 + sub item 1 + sub item 2
To write equations, enclose them between $ signs.
$\beta_0=55$ gives β0 = 55
To get Greek letters: \sigma produces σ etc .
superscripts^2 gives superscripts2 subscripts_1 gives subscripts1
\ne gives ≠ \times gives ×

Want to put a hat on a symbol?
Want to take a square root?
Want to make a fraction?
Want to make a silly complicated formula?
$\hat{\mu} \ne \sqrt{\frac{\beta_0^2}{\sigma \times \bar{x}}}$
gives
≠ l
八
√n
this that
2023-01-10
