闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

QBUS6860 – Individual Assignment 1

Rationale

This assignment has been designed to help students develop basic skills in data visualization and to allow students to practice techniques learned in lecture and tutorial.

Key Admin Information

1. Required submissions:

a. ONE written report (word or pdf format, through Canvas- Assignment 1 Report Submission).

b. SEVERAL Python “ .py” or Jupyter Notebook “ ipynb” files (through

Canvas- Assignment 1 - Upload Your Program Code Files).

2. The late penalty for the assignment is 5% of the assigned mark per calendar day, starting after 4pm on the due date. The closing date Monday 11 April 2022, 4:00 pm is the last date on which an assessment will be accepted for marking.

3. Length: The main text of your report (including everything except for possible appendices) should have a maximum of 10 pages in normal 12 point fonts and single line. For each Task, you should write a sufficient and complete report with necessary plots based on your visualization, methodology, analysis, insight and limitations, etc, when possible.

4. Numbers with decimals should be reported to the Fourth-decimal point in the report.

5. If you wish to include additional material, you can do so by creating an appendix. There is no page limit for the appendix. Keep in mind that making good use of your audience’s time is an essential business skill. Every sentence, table or figure has to count. Extraneous and/or wrong material will potentially affect your mark.

6. Anonymous marking: Given the anonymous marking policy of the University, please only include your student ID (SID) in the submitted report, and do NOT include your name. The file name of your report should follow the following format. Replace "XXXX" with your SID in, for example, QBUS6860_2021S1_SIDXXXXX.pdf or QBUS6860_2021S1_SIDXXXXX.doc.

7. Presentation of the assignment is part of the assessment. Markers will assign up to 10% marks for clarity of writing and presentation.

8. For Turnitin to check your code, please copy and paste your codes into Appendix. Code should be formatted by equal width fonts such as Courier New or Consola.

If your programs are in py file, simply copy and paste into the report Appendix. If you are using Jupyter Notebook, please follow InstructionPY to convert it to “ py” files first then copy the created py files into Appendix of the report.

Key Rules

• Carefully read the requirements for each part of the assignment.

• Please follow any further instructions announced on Canvas.

• You must use Python for the assignment.

• Reproducibility is fundamental in data analysis, so that you make sure you suggest the right Python py file or Jupyter Notebook ipynb files that generate the results in your report. Markers will run your program for checking.

• The University of Sydney takes plagiarism very SERIOUSLY. Please be warned that plagiarism between individuals/groups is always obvious to the markers and can be easily detected by Turnitin.

• Not submitting your code will lead to a loss of 50% of the assignment marks.

• Failure to read information and follow instructions may lead to a loss of marks. Furthermore, note that it is your responsibility to be informed of the University of Sydney and Business School rules and guidelines, and follow them.

• Referencing: Business School recommends APA Referencing System. (You may find the details at: https://libguides.library.usyd.edu.au/citation/apa7 )

• Feedback will be provided on the marked submission.

Task A (40 Marks)

This task is designed for you to practice your skills in conducting basic Visual Data Analytics (VDA) and Exploratory Data Analysis (EDA).

Background

The COVID- 19 pandemic in Australia is part of the ongoing worldwide pandemic of the coronavirus disease 2019 (COVID- 19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The first confirmed case in Australia was identified on 25 January 2020, in Victoria (from https://en.wikipedia.org/wiki/COVID- 19_pandemic_in_Australia). Since then the Australian Federal Government has collected data for the COVID- 19 pandemic in Australia. The data is useful in making decision on public policies by all type agencies.

Resources

https://www.covid19data.com.au/is a place to get the updated Covid- 19 Data for Australia, which is from the Australian Federation Government (https://www.health.gov.au/health- alerts/covid-19) and State Government Health agencies. You can download the dataset as described in https://www.covid19data.com.au/data-notes, or from Matt Bolton’s GitHub repository https://github.com/M3IT/COVID-19_Data/tree/master/Data

A copy of the dataset has been on Canvas for your convenience, but you are encouraged to download the most recently updated data from the above GitHub site directly. The data files are all in csv format. It is easy to identify the meaning of each column in each file.

Tasks

You are receiving 2 visualisation types at random (e.g., your randomly selected types could be violin and scatterplot or histogram and bubble plot, etc.). Please check the list file

QBUS6860_Assignment01_RandomTask.xlsx for your assigned visualization type by using your Student ID. This is file on Canvas along with this document.

1. [8 Marks] Play with all the dataset files, report and explain all the statistics, such as the total positive COVID- 19 cases so far etc.

2. [12 Marks] Use your two randomly assigned visualisation types to analyse the data (you may use other types in addition to the types you are assigned, but you must use your assigned types). For example, you were assigned histogram and bubble plot but you think that the data could be better represented using a stream graph. You may use stream graph in addition to histogram and bubble plot, but you must use at least histogram and bubble plot in your analysis. If an assigned type is not appropriate for this set of data, please explain the reason.

Always keep in mind the visual presentation should be meaningful and visually pleasing.

3. [10 Marks] Conduct appropriate analysis and report your insights. You shall consider this task as challenging.

4. [5 Marks] Summarise your conclusion on for example whether data is in good quality, what else information can be collected, so to put forwards your suggestion.

Note: The other 5 marks are allocated for presentation quality

Task B (60 Marks)

Finding ICLR2022 (https://iclr.cc/Conferences/2022/) Authors Affiliation(s) and Email Address(es) from OpenReview site https://openreview.net/group?id=ICLR.cc/2022/Conference. This task is designed for you to apply techniques in data management and EDA.

Background

The International Conference on Learning Representations (ICLR) is the premier gathering of professionals dedicated to the advancement of the branch of artificial intelligence called representation learning, but generally referred to as deep learning. ICLR is globally renowned for presenting and publishing cutting-edge research on all aspects of deep learning used in the fields of artificial intelligence, statistics and data science, as well as important application areas such as machine/computer vision, computational biology, speech recognition, text understanding, gaming, and robotics.

Resources

You may re-use part of tutorial codes and revise it for your purpose here.

Tasks

1. [5 Marks] Acquire ICLR2022 authors ids each of which is either an OpenReview ID or an email address of an author. You may rely on some code snippets from Tutorial 3.

2. [12 Marks: Challenging] Write Python code to extract all the authors profiles. As shown in Tutorial 3, each author has an ID on OpenReview site (or email address). You need to get IDs for all ICLR2022 authors. Then an author profile can be accessed like https://openreview.net/profile?id=~Junbin_Gao1 where ~Junbin_Gao1 is called author ID (username). On a sample page, locate where Author Affiliation and Email Address is, then try to write your own web crawler to get this information for all ICLR2022 authors.

Warning: prepare to wait for getting all the information after you deploy your crawler.

3. [12 Marks: Challenging] Explore and report some statistics, such as the total number of authors, how many missing values for their affiliations or emails, how many different affiliations, where are authors from etc.

Note 1: Generally speaking, each appearance of an author ID means a paper submission. It is possible to tell how many papers an author submitted and how many papers from a particular organisation.

Note 2: Openreview captures all the emails for the organisations with which an author is associated or/and was associated. I suggest you use the first email address in their email list as an author’s current affiliation.

Note 3: As there is no country information collected in author profile, you may need to rely on email domain to map to a country, for example, from sydney.edu.au we know au is the code for Australia. But people may use some common email domains such as ['gmail.com', 'qq.com', '126.com', '163.com', 'outlook.com', 'hotmail.com', 'yahoo.com', 'foxmail.com', 'aol.com', 'msn .com', 'ymail.com', 'googlemail.com', 'live.com']. In this case, please take the following strategy: (1) if such as a common email address appears as an author’s first email address, then check the second email address to identify the country; (2) if such a common email address is

the only email address for the author, you may aggregate them in a group of “unidentifiable” .

4. [8 Marks] Visually present the statistical information you have discovered in Task 3.

5. [10 Marks] Identify or discuss whether there is any missing information in Task 2. What is your suggestion regarding this?

6. [8 Marks] (Challenging!) Segment authors into three major groups: University, IT Company (eg. Google, Tencent etc), and Others.

Note: The other 5 marks are allocated for presentation quality

Marking Criteria

1. The content in your report in general should focus on the appropriateness of the chosen methods and provide full explanation and interpretation of any results you obtain in your report. Output without explanation will receive zero marks in the relevant part.

2. Describe your data analysis procedure in detail: how the Exploratory Data Analysis (EDA) step is done, what and why models/methods are used, how your approaches are chosen etc with sufficient justifications. The description should be detailed enough so that other data scientists, who are supposed to have background in your field, understand and are able to implement the task.

3. Clearly and appropriately present any relevant graphs and tables.

4. Presentation of the assignment is part of the assignment. Markers will assign certain percentage of the mark for clarity of writing and presentation. It is recommended that you should include your Python code as appendix to your report, however you may insert small section of your code into the report for better interpretation when

necessary.

5. All your analysis must be with Python. The Python implementation must be well presented/commented in a professional way. The main program file should be named as QBUS6860_2022S1_SIDXXXX.py (Jupyter Notebook ipynb) and others with meaningful naming. Low quality code will attract a penalty of up to 10% overall marks. If the marker cannot get your program run, some partial marks (maximum 15%) will be deducted from overall marks.