Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Assignment 4

Motivation

This assignment is a chance for you to showcase the skills you have learnt in this unit. We want you to answer a question of your choosing using an open data set and write a blog post about your analysis.

The most important thing Iʼve ever done for learning R is to paradoxically stop “learning” (e.g. classes and problem sets) and start doing. Take a problem you have at work or school or a data set you find interesting and get to work. Then write it up and post on github or a blog.

— We are R-Ladies (@WeAreRLadies) March 25, 2019

Learning Objectives

We want you to:

be creative and have fun!

show you know how to be curious about data

show that you know how to find open data to answer a question

show you can turn open data into information and then knowledge

demonstrate a variety of skills that youʼve developed in this unit.

Checklist for Getting Started

Before you get started:

Think about the scope of your assignment. Can you answer the question well in the time available?

Does the question youʼve posed provide you sufficient scope to show off some of the skills youʼve learnt in this unit?

Is the data youʼve selected suitable to answer your question? Things you should consider include:

does the data contain the right variables?

what will you need to do to process the data?

are any wrangling steps needed?

can I handle the size of the data?

If you write pseudo-code does your approach make sense?

Is your proposed analysis original? Ensure your analysis is NOT a reproduction from an analysis in another unit or from online. Marks will be heavily deducted if this is found to be the case, and we will check.

Lastly, is this question unique to you and your interests? We want you to chose something you are passionate about to practice your learning.

Important

Get your question and data choice checked during your tutorial or during consultation. You can also post to the discussion forum to seek advice from the teaching staff and other students on whether your question meets the brief.

Ideas

If you are stuck on what type of question to ask or what data to use - ask the teaching team for help. We will also share some ideas on the discussion forum and some examples of assignments from last yearʼs cohort on the Moodle.

Caution

Be careful about the data you chose given the Moodle upload limits (500mb per .zip file). If you want to use a data set that will exceed these limits you must get approval from the teaching team prior to SWOTVAC.

You also cannot use data from previous assignments. A zero mark may be awarded if you do. If you really want to use this data, chat to the teaching team and get approval prior to SWOTVAC.

Task 1: Downloading and Documentating your Data Process (15 marks)

1. What is the question you are aiming to answer in your blog post? (You may have multiple sub-questions.) Share why you chose this question.

2. (3 marks) What data will you be using to answer that question? Explain why this data is suitable for the task.

3. (6 marks) In your submission, include comprehensive details about your data. You many like to include a read me and a data dictionary. Also be sure to make clear the type of data (census, sample, experimental), any data limitations, data privacy or any ethical considerations. The relevant details here will depend on the data you choose and the analysis you perform.

4. (6 marks) Describe the steps to download your data and any steps to process your data for your analysis.

Note: You must describe all data sets you use.

Task 2: Writing a blog post (15 marks)

Write a blog-post on your chosen topic.

Content: In your blog post, explore analytically at least 2 - 3 different aspects of your data. Show evidence of critical thinking by interpreting your analysis. Show you are able to write up your results and conclusions from your work. You clearly must demonstrate a range of skills developed in this unit in answering your question.

Template: An example outline is provided in the assignment template folder. You can also use your own.

Audience: Your audience for this assignment is your fellow students in ETC5512.

Length: The time it takes to read your blog post should be roughly 3 - 5 minutes (~ 700 words to 1500 words). Be kind to your markers! They do not want to read novels.

Finally: Given the word count, only include the most important, interesting and relevant parts about your data and your analysis. Other details can be shared in an Appendix or in Parts 1 and 3 where relevant.

Task 3: Behind the Scenes (12 marks)

Warning

Do not use AI to answer these questions.

These should be written reflections of your own work. We want you to share your personal insights about your analysis.

As we have discussed, much of what goes into an analysis happens behind the scenes. This includes the detective work and the less sexy parts of your analysis.

5. (3 marks) Tell us about parts of your data processing or analysis that werenʼt “sexy” and wouldnʼt typically be included in a blog post. (e.g. Was their any data drudgery or time intensive wrangling? Were there any repetitive tasks or manual tasks? If it was easy, describe what made it easy?)

6. (3 marks) Were there any challenges that you faced in conducting this analysis. These may take the form of data limitations or coding challenges? (e.g. Was there anything in your analysis that you were not anticipating when you started? Did you have to change your intended scope? Did you need to master a new skill? Were there any problems you were proud of solving?)

7. (3 marks) Tell us about any imperfect parts of your work and how you would like to expand or improve this analysis in future? Be clear about any limitations or aspects of your analysis that fell beyond scope.

8. (3 marks) Also submit 4 earlier versions of your assignment to show your iterative process. These should be your messy versions and include exploratory code. We recommend you save these files as you progress through your assignment. Provide a short overview for markers of what you fixed/learnt/improved/changed between each file. (If you are comfortable with GitHub you may submit your Github repo, but please refer to individual commits.)

For these 4 questions we expect answers no longer than a paragraph each. You can also use dot points were appropriate. (Absolute max 250 words per answer, although if you can get your point across in fewer words that is fine too.)

Submission

Answer the questions and submit your report about the data using the Quarto template provided. Render your .qmd file to produce a .html file for markers.

Include your data set in the data folder, so the markers can reproduce your work.

Ensure your code will run, and that your report will build on another computer (your markers!). This is important so we can reproduce your report. Use an R Project and do not submit direct file paths!

Ensure your code is neat and tidy.

Ensure your writing is clear and concise. For this task consider factors of organisation, structure, length and grammar.

Include an AI acknowledgement and include links to all queries (except for Task 3 as stated above where AI is not to be used.)

Include all relevant citations and be sure to cite all R packages used. The function ?citation() will be helpful here.

The above is assumed as standard. Marks will be deducted if the above standards are not met as per the marking scheme from assignment 1 with possible additional penalties for inappropriate use of AI.