Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MATH1324 Introduction to Statistics Assignment 1

Analysis of Climate data

Overview

The goal of this assignment is simple and you will learn to work with real world data. You will work with climate data that is selected from some stations in Sydney and Melbourne. You will do any relevant statistical analysis that is taught in this course. For example, computing descriptive statistics and explaining whether each of the selected variables between (Maximum temperature, Solar exposure and Maximum wind speed) in climate data has a normal distribution or not To do this, you are going to use the Climate Dataset which is located on canvas, in week 5: Learning Materials/Activities.

This assignment is worth 15% and the due date is 28/08/2022.

Climate Dataset Description

It contains 3 variables (or columns) which are (Maximum temperature, Solar exposure and Maximum wind speed) for two cities of Melbourne and Sydney. It is only considered 3 months of the year that are Dec2021-Jan 2022 and Feb-2022.

Data Source: httD:〃www.bom.qov.au/climate/data/stations/

httD:〃 www.bom.qov.au/climate/data/index.shtml7bookmarg 36

Variables in the dataset:

1- Daily Wind speed: Maximum speed of wind,

2- Solar exposure: The Daily global solar exposure is the total solar energy for a day falling on a horizontal surface. It is measured from midnight to midnight. The values are usually highest in clear sun conditions during the summer and lowest during winter or very cloudy days and

3- Maximum temperature: It is the highest temperature recorded in the 24 hours from 9 am.

Assignment Instructions

1- As you can see from the above explanations and the data set we have 3 variables (wind speed, Solar exposure and temperature) but for this assessment, you are required to select ONLY Two of the variables from the dataset for this investigation. You must decide which variables to deal with. You don't need to include all variables.

2- Since Melbourne and Sydney tend to have different weather, you are required to investigate the normality distribution of each selected variable separately in Melbourne and Sydney.

3- You need to give summary statistics (i.e., mean, median] standard deviation, first and third quartile, interquartile range, minimum and maximum values) for your variable of interest separately in Melbourne and in Sydney using R functions.

4- Then you will use R to summarise the distribution of variables separately in Melbourne and Sydney and compare it to a normal distribution. You need to do this visually by plotting the histogram with normal distribution overlay and make recommendations regarding the modelling of these variables. The “empirical distribution" in this assignment means plotting the histogram of data. Some references that can help you to complete this part of the assignment are

httDS://www.youtube.com/watch?v=6eiwcd4Z3ic

httDS://bicorner.com/2015/04/09/data-fittinq-with-i7

And httDS://usnrs.dimi.uniud.it/~massimo.franceschet/R/fit.html

Submission

Assignment 1 must be completed using the R Notebook template available under week5: Learning Materials/Activities.

Reports are limited to 12 pages maximum (this includes code). Information for using R Notebooks can be found here. The R Notebook template must be updated with your student ID details and your responses and code for the following sections. You must use the headings and chunks provided in the template. You can add more chunks to explain your approach if required.

Report Section Descriptions

The report will be in a reproducible R Notebook format with written sections, R code and output. The report will be composed of the following sections (see Template above).

1. Problem Statement [Plain text]: Write a clear and concise problem statement that guides your investigation. Explain the variables and outline the approaches taken for normal distribution fitting.

2. Load Packages [R Chunk]: This section is not marked.

3. Data [R Chunk]: Import the climate data and prepare it for analysis. Show your code.

4. Summary Statistics [R Chunk]: Calculate descriptive statistics (i.e., mean, median, standard deviation, first and third quartile, interquartile range, minimum and maximum values) of each variable grouped by city.

5, Distribution Fitting [Plain Text and R Chunk] : Compare the empirical distribution of each variable to a normal distribution separately in Melbourne and in Sydney. You need to do this visually by plotting the histogram with normal distribution overiay. Show your code.

6. Interpretation [Plain text]: Going back to your problem statement, what insight has been gained from the investigation? Discuss the extent to how the normal curve (distribution) fits the data. For example, some references for this part to help you can be httDS://www.youtube.com/watch?v=6eiwcd4Z3ic

httDS://bicorner.com/2015/04/09/data-fitting-with-r/

And httDS:usnrs.dimi.uniud.it/~massimo.franceschet/RZfit.html

The report must be uploaded as a PDF with your code chunks showing. The easiest way to achieve this is to Preview your notebook Open in Browser (Chrome) — Right click on the report in Chrome —> Click Print and Select the Destination Option to Save as PDF.

Extensions will only be granted in accordance with the RMIT University Extension nnd Special Consideration Policy. No exceptions. Assignments submitted late will be penalised (see Course Information for further details).

Groups

Students are permitted to work individually or in groups of up to 3 for Assignment 1. One of the group members must fill out the following form to register their group details. Submit the details of your group here.

Group Reaistration Form

If you work in a group then only One of the group members must submit a copy of the report and not all members. Group members that are not registered or do not submit an assignment will not be acknowledged. One group member's submission will be marked and given feedback. It will be the responsibility of the marked group member to share the group's feedback with the other group members. The other group members will receive a mark only.

Collaboration

You are permitted to discuss and collaborate on the assignment with your classmates. However, the write-up of the report must be an individual effort. Assignments will be submitted through Turnitin, so if you've copied from a classmate, it will be detected. It is your responsibility to ensure you do not copy or do not allow another classmate to copy your work. If plagiarism is detected, both the copier and the student copied from will be responsible. It is good practice to never share assignment files with other students. You should ensure you understand your
responsibilities by reading the RMIT University website on academic integrity. Ignorance is no excuse.

Assignment 1 MATH 1324

Criteria

 

 

Ratings

 

 

Pts

This criterion is linked to a learning outcome

Problem statements

2 to >1.0 Pts

Excellent

A clear and accurate problem statement was provided.

1 to >0.0 Pts

Needs Improvement

A problem statement was provided, but it was not clear or complete

OPts

Not acceptable

No problem statement was provided.

2 pts

This criterion is linked to a learning outcome

Read/lmport and save data

1 to >0.5 Pts

Excellent

Data import and management was appropriate.

0.5 to >0.0 Pts

Needs Improvement

The attempt to read/import data set was successful but unable to save the data in the correct format.

OPts

Not acceptable

The data import and management was insufficient or inappropriate.

1 pts

This criterion is linked to a learning outcome

Summary Statistics

3 to >1.5 Pts

Excellent

Summary statistics were accurate and given grouped by city.

1.5 to >0.0 Pts

Needs Improvement

Summary statistics were provided for the variable of interest but not separately given by city OR some of the summary statistics were missing.

OPts

Not acceptable

No summary statistic was provided OR the summary statistics were inaccurate.

3 pts


This criterion is linked to a learning outcome

Fitting and Appearance of plots

4 to >2.0 Pts

Excellent

The attempt to compare the data (empirical) distribution to the normal distribution was appropriate and well explained.

2 to >0.0 Pts

Needs Improvement

The attempt to compare the empirical distribution to the normal distribution needed improvement. For example: -Investigation was not provided separately for Melbourne and Sydney

-Empirical distribution was given but the normal distribution was missing.

OPts

Not acceptable

The attempt to compare the data (empirical) distribution to the normal distribution is inappropriate or missing.

4

pts

 

This criterion is linked to a learning outcome

Interpretation

5 to >3.0 Pts

Excellent

The interpretation of the extent to how the normal distribution fits empirical data was appropriate and well explained.

3 to >0.0 Pts

Needs Improvement

The interpretation of the extent to how the normal distribution fits empirical data was provided but needed improvement For example:

-Interpretation was not provided separately for Melbourne and Sydney.

-Explanation and/or the approach taken needed improvement.

0 Pts

Not acceptable

The interpretation of the extent to how the normal distribution fits empirical data was missing.

5 pts

Total points: 15