Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Department of Computer Science

Summative Coursework Set Front Page

Module Title: Programming in Python for Data Science

Module Code: CS2PP22

Lecturer responsible: Dr Todd Jones

Type of Assignment (coursework/online test): Coursework

Individual / Group Assignment: Individual

Weighting of the Assignment: 100%

Page limit/Word count: Approximately 1,500 words, excluding captions and tables

Expected hours spent for this assignment: 20 hours

Items to be submitted online (Blackboard):

3 Items:

•    A single .zip (preferred) or .tar.gz archive

o Containing files and directories as outlined below in the Assignment Submission Requirements

•    A copy of the completed CS2PP22_Assessment_Task1.ipynb in .pdf format, which displays all content (code, markdown text, figures, images, etc.).

•    A copy of the completed CS2PP22_Assessment_Task2.ipynb in .pdf format, which displays all content (code, markdown text, figures, images, etc.).

Work to be submitted on-line via Blackboard Learn by: 2023 March 13th (Monday) 12:00 noon Work will be marked and returned by: 2023 April 4th (Tuesday)

NOTES

By submitting this work, you are certifying that it is all your sentences, figures, tables, equations, code snippets, artworks, and illustrations in this report are original and have not been taken from any other person's work except where explicitly the works of others have been acknowledged, quoted, and referenced. You understand that failing to do so will be considered a case of plagiarism. Plagiarism is a form of academic misconduct and will be penalized accordingly. The University’s Statement of Academic Misconduct is available on the University web pages.

If your work is submitted after the deadline, 10% of the maximum possible mark will be deducted for each working day (or part of) it is late. A mark of zero will be awarded if your work is submitted more than 5 working days late. You are strongly recommended to hand work in by the deadline as a late submission on one piece of work can impact on other work.

If you believe that you have a valid reason for failing to meet a deadline then you should make an Exceptional Circumstances request and submit it before the deadline, or as soon as is practicable afterwards, explaining why. To make such a request log on to  RISIS and on the Actions tab select  Exceptional Circumstance: as explained  at https://www.reading.ac.uk/essentials/The-Important-Stuff/Rules-and-regulations/Exceptional-

Circumstances

ASSESSMENT CLASSIFICATIONS

This coursework assesses your ability to:

•    understand and use appropriate Python syntax and ecosystem;

•    implement common computer science algorithms and functional programming in Python;

•    understand statistical and machine learning methods for data analytics and mining in Python;

•    apply appropriate statistical and machine learning techniques for data science tasks .

In general, you will gain credit for:

•    preparing and submitting required files as requested;

•    successful implementation of the specified coding tasks;

•    writing efficient, functional code;

•    providing thoughtful, clear, well-structured written analysis.

Your assignment will be marked according to the marking scheme provided below. The scheme is designed so that  the  collectively  weighted  assignment  mark  will  correspond  to  the  following  qualitative  degree classification descriptions:

The table below shows what is typically expected of the work to obtain a given mark.

Classification Range

Typically, the work should meet these requirements:

First Class (>=70%)

Outstanding/excellent work with correct codes and results. An outstanding work should demonstrate coding proficiency with high efficiency and based on advanced techniques. Evidence of independent research into methods used and a thorough justification of applications of these methods.

Upper Second (60-69%)

Good work with few mistakes. Some minor tasks have not been carried out or are not completely correct. Coding with  good  efficiency.  Evidence  of  good knowledge of the core concepts, with good explanations and justifications.

Lower Second (50-59%)

Demonstrates knowledge of core concepts but with some mistakes. Explanations and justifications of methods used are logical but limited in depth. Coding with average efficiency. Most tasks have been carried out with sufficient accuracy.

Third (40-49%)

Some parts of the assignment are missing and/or have partially correct results. Most tasks have not been carried out with sufficient accuracy. Results may not be correct or technically sound. Mistakes in application of knowledge and shows some misunderstandings. Explanations and justifications of methods used are not clear or logical. Coding might be inefficient.

Pass (35-39%)

Some significant part of the assignment is missing and/or has partially correct results. Gaps in knowledge and many mistakes, little evidence of understanding. Methods used are not well explained or justified. Coding is notably inefficient.

Fail (0-34%)

Many aspects of the assignment are missing, or there are large gaps in knowledge and significant mistakes, also showing limited understanding. Lack of logical explanations behind the methods used.

ASSIGNMENT DESCRIPTION

Major Coursework (100% of module assessment)

This assignment consists of two tasks.  Both of these will be used to assess your implementation of elements of the Data Science process, using Python as the main tool.

A detailed breakdown of theMarking Schemeis provided later in this document.

Task 1 Data Preprocessing, Exploratory Data Analysis, and Python Classes

Using the cardata.csvfile within the CS2PP22_Assessment_Task1.ipynbJupyter notebook, you will execute several components of the data science process and design and implement a class structure that controls and compiles data about a fictional sporting event by writing Python code to perform the outlined sub-tasks detailed in the notebook. Working through this notebook, you will read, write, and manipulate data to extract specific features, design and implement functional routines, and design and implement an algorithm to select an optimal subset from a larger dataset.

Some sub-tasks will ask you to provide a written explanation of the justification behind it your coding choices. Code and written responses should be presented in a set of well-formatted code and Markdown cells at appropriate points in your Jupyter notebook. This work will require the production and submission of additional files; details about these files and how they should be submitted are provided in the notebook and the Assignment Submission Requirements.

Task 2 Twitter Data Analysis

Using the CS2PP22_Assessment_Task2.ipynb Jupyter notebook, you will extract data from the social media platform, Twitter, and use the data as the basis for implementing components of the data science process to build and test a regression model. You will need to extract at least 300 tweets (perhaps, the 300 most recent tweets) from at least 3 Twitter accounts.

Visualise the results concisely and discuss the reasons why one might prefer the use of one of your tested methods over another. As in Task 1, written responses should be provided in