闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

QBUS6860 – Individual Assignment 2

2022

Rationale

This assignment has been designed to help students develop data analytics and visualisation skills and to allow students to practice state of the art approaches that can be used in storytelling based on Visual Data Analytics (VDA) on real world datasets.

Key Admin Information

1. Required submissions:

a. ONE written report (word or pdf format, through Canvas – Assignment 2- Report Submission) of not more than 15 pages (excluding appendices) in normal 12 point fonts and single line - this is the full report including all graphs and any additional materials or outputs of your analysis etc. I would expect this is in a typical research paper format which includes sections of Introduction, Question Description, Analysis Methods/Process, Results Presentation/Analysis, Summary/Conclusion, and References, and plus Appendix if any.

b. A Full Set of Python “ .py” or Jupyter Notebook “ ipynb” files (through Canvas- Assignment 2 - Upload Your Program Code Files) and plus any datasets of your own. Important: If you made significant changes to the provided data files (e.g. merged several tsv files, mined additional data, etc.), you must also upload your datasets along with your program files so that we can check correctness of your calculations! This is to make sure your code can be verified by markers.

2. The late penalty for the assignment is 5% of the assigned mark per day, starting after 4pm on the due date. The closing date Monday 30 May 2022, 4:00 pm is the last date on which an assessment will be accepted for marking.

3. Numbers with decimals should be reported to the Fourth-decimal point in the report.

4. If you wish to include additional material, you can do so by creating an appendix. There is no page limit for the appendix. Keep in mind that making good use of your audience’s time is an essential business skill. Every sentence, table and figure have to count. Extraneous and/or wrong material will potentially affect your mark.

5. Anonymous marking: As the anonymous marking policy of the University, please only include student ID in the submitted report, and do NOT include your name. The file name of your report should follow the following format. Replace "XXXX" with your SID. Example: QBUS6860_2022S1_SIDXXXX.

6. Presentation of the assignment is part of the assessment. Markers will deduct up to

10 marks for poor clarity of writing and presentation.

7. For Turnitin to check your code, please copy and paste your codes into Appendix. Code should be formatted by equal width fonts such as Courier New or Consola.

If your programs are in py file, simply copy and paste into the report Appendix. If you are using Jupyter Notebook, please follow InstructionPY to convert it to “ py” files first then copy the created py files into Appendix of the report.

Key Rules

• Carefully read the requirements for the assignment.

• Please follow any further instructions announced on Canvas and ED.

• You May Do your data manipulation outside Python (e.g. using excel) although we believe using python is more convenient. However you MUST use Python to produce any visualisations that you have. You must submit your Python code with your processed data for verification.

• Reproducibility is fundamental in data analysis, so that you make sure you suggest the right Python py file or Jupyter Notebook ipynb files that generate the results in your report. Markers will run your program for checking.

• The University of Sydney takes plagiarism very seriously. Please be warned that plagiarism between individuals/groups is always obvious to the markers and can be easily detected by Turnitin.

• Not submitting your code will lead to a loss of 50% of the assignment marks.

• Failure to read information and follow instructions may lead to a loss of marks. Furthermore, note that it is your responsibility to be informed of the University of Sydney and Business School rules and guidelines, and follow them.

• Referencing: Business School recommends APA Referencing System. (You may find the details at: https://libguides.library.usyd.edu.au/citation/apa7 )

• Feedback will be provided on the marked submission.

Background

The International Conference on Learning Representations (ICLR) is the premier gathering of professionals dedicated to the advancement of the branch of artificial intelligence called representation learning, but generally referred to as deep learning. ICLR is globally renowned for presenting and publishing cutting-edge research on all aspects of deep learning used in the fields of artificial intelligence, statistics and data science, as well as important application areas such as machine vision, computational biology, speech recognition, text understanding, gaming, and robotics.

Each of recent years, the conference attracts more than 2000 paper submissions, touching a wide range of modern machine learning research.

Project Description and Requirement

This project is designed for you to practice your skills across the entire Exploratory Data Analysis/Visual Data Analytics (EDA/VDA) process including storytelling.

It is better to regard this as a new research project in which you have sufficient flexibility to conduct your research.

The goal is, but not limited to, to describe the current status of deep learning/machine learning/AI research, its past, its current patterns and its trend into future through the following steps:

1. Acquire data from ICLR2017 - 2021 conferences, hosted on OpenReview website server https://openreview.net. Note: Most data have been collected for you, please download the copy from Canvas. When this project starts, it is very likely that data of ICLR2022 will be available, you are encouraged to include 2022 data into your project, but this is NOT a must.

2. Explore the data (both the provided and that you may gather) to find a story and ask questions. For example, what are the major topics of each conference? How does it change in the recent years? Who are the most productive authors? Which university/organisation takes lead? Many questions you can ask.

3. Assess and explain the fitness of the data for answering your question. For example, if you want to know the research collaboration network among some researchers/authors, you may need acquire the relation/network information which OpenReview server can provide (you need find a way to get it).

4. Create necessary visualisation(s) that tell the story about the conference data. These visualisation should be used for the purpose of revealing patterns/your discovered insight and presenting the story you are telling underpinned by your solid data analysis.

5. Carefully explain your conclusions: what insight does your analysis bring?

Resources

1. You may borrow some ideas from previous research at:

https://github.com/shaohua0116/ICLR2020-OpenReviewData

https://github.com/sharonzhou/ICLR2021-Stats

2. If you want to include ICLR2022 data, you may read the following. Note: using ICLR2022 data is NOT A MUST for this project.

https://openreview-py.readthedocs.io/en/latest/getting_data.html#getting-iclr- 2019-data https://www.browserstack.com/guide/python-selenium-to-run-web-automation- test

3. Datasets that we have already obtained are:

ICLR 2017: ICLR2017_paperlist.tsv, ICLR2017_affiliations.tsv ICLR 2018: ICLR2018_paperlist.tsv, ICLR2018_affiliations.tsv ICLR 2019: ICLR2019_paperlist.tsv, ICLR2019_affiliations.tsv ICLR 2020: ICLR2020_paperlist.tsv, ICLR2020_affiliations.tsv ICLR 2021: ICLR2021_paperlist.tsv, ICLR2021_affiliations.tsv

4. How to load each of the above data files into your program?

From the file names we know these are in the so-called Tab Separated Value format. They are similar to csv files. A csv file uses the comma as a separator to separate values between items. However, some of our data themselves have commas as part of their values, so the Tab is used as a separator. Basically, we use the same way as loading csv files to load tsv files as follows, e.g.,

df = pd.read_csv('ICLR2021_paperlist.tsv', sep='\t', index_col=0)

where we use sep='\t' to tell pandas that the Tab is the separator. As the first column of these tsv files is the index (i.e. row numbers) and we don’t need this, we use index_col=0 to tell pandas to ignore the first index column.

If you wish to look into a tsv file with MS Excel, please follow the instruction in Instruction_to_TSV.pdf on Canvas.

5. Data dictionary and value formats: All the data values in these files are strings. The paperlist files have the following data dictionary:

forum: This is a unique system generated ID for each paper submission, such

as B1e9Y2NYvS . It is a string value.

id: Same as forum

title: The paper title in Text

authors: All author names of a paper in the form of [‘firstlast name1’,

‘firstlast name2’, ‘firstlast name3’ ] depending on the number of all authors. When the data is read into python, all chars like [, single quota ‘ and ] become part of the string

authorids: similar to the authors column but with their IDs as values. They could

be author’s email address (particular before 2021) or the ID such as ‘~Junbin_Gao1’ . In this data file, you may find an authorid may appear in multiple rows in data table. That means this author has multiple paper submissions.

abstract: The paper abstract in Text

keywords: Several keywords about the paper in the form of [‘keyword1’,

‘keyword2’] etc. From this information, you can tell what possible topic the paper is about.

one-sentence_summary: Summary of the paper in Text

final decision about the paper submission, such as accepted as oral presentation, or poster or rejected etc. For each different year, the value could be in different words.

the marks given by different reviewers in the form of ['3: Weak Reject', '6: Weak Accept', '6: Weak Accept'] in 2021. It may be in different formats in different years.

The affiliation files have the following dictionary:

authorid: Authors ID, in the form of such as ‘~Junbin_Gao1’. Note that the

author IDs may be duplicated. This is a piece of important information from which you can count how many paper submission a particular

author has. For example, if an ID is repeated three times, this means this authors submitted three papers to the conference.

author_name: The author name in the form of ` firstname lastname’ .

affiliation: The author’s affiliation name. It could contain both department and affiliation, separated by a comma. Sometimes the departments comes after affiliation name, as author may input their address in a different order.

emails: The authors email address. Some authors have multiple email

addresses, separated by commas.

Note: Since 2021, for the privacy concerns, OpenReview no longer releases actual email address for paper authors, but just email domains. For example, my email address junbin.gao@sydney.edu.au will be released as ****@sydney.edu.au. However this is not a concern for your tasks in this assignment, as email domains are what you may need.

Marking Criteria (total 60 Marks)

Component	Excellent = 12mks	Satisfactory = 8mks	Poor = 4mks	Weight
Data Question	An interesting question (i.e., one without an immediately obvious answer) is posed. The visualisation provides a clear answer.	A reasonable question is posed, but it is unclear whether the visualisation provides an answer to it.	Missing or unclear question posed of the data.	12mks
Mark, Encoding, and Design Choices	All design choices are effective. The visualisation can be read and understood effortlessly.	Design choices are largely effective, but minor errors hinder comprehension.	Ineffective mark, encoding, or design choices are distracting or potentially misleading.	12mks
Titles & Labels	Titles and labels helpfully describe and contextualise the visualisation.	Most necessary titles and labels are present, but they could provide more context.	Many titles or labels are missing, or do not provide human- understandable information.	12mks
Design Rationale	Well-crafted write-up provides reasoned justification for all design choices.	Most design decisions are described, but rationale could be explained at a greater level of detail.	Missing or incomplete. Several design choices are left unexplained.	12mks
Creativity & Originality	You exceeded the parameters of the assignment, with original insights or a particularly engaging design.	You met all the parameters of the assignment.	You met most of the parameters of the assignment.	12mks