闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CS989: Big Data Fundamentals

RESIT – COURSEWORK

DEADLINE

DUE: 12:00 noon, Wednesday July 12th, 2023

AIM OF THE ASSIGNMENT

To provide deeper understanding of appropriate methodological approaches to processing and analysing noisy data; and to encourage appreciation of the challenges involved in data analysis.

LEARNING OUTCOMES

Understanding of the fundamentals of Python to enable the use of various big data technologies; Understand how classical statistical techniques are applied in modern data analysis; Understanding of the potential application of data analysis tools for various problems and appreciate their limitations; Understanding of the challenges and complexity of data analysis.

THE BRIEF

Provide a brief report on the analysis of an open dataset. There are some restrictions on the dataset that can be selected (see below “DATASET RULES”). You can focus your report on one aspect of the dataset or multiple aspects, the main objective is to find some interesting questions or problems to answer.

The following criteria will be used when marking your assignment:

• Identification and description of key challenge(s) or problem(s) to be addressed 10%

• Introduction to the dataset 10%

• The challenge(s)/problem(s) is (are) to be addressed using the following 20%

o Summary statistics (including figures) for data being analysed 20%

o Description, rationale, application and findings from only one unsupervised analysis method covered in the module 20%

o Description, rationale, application and findings from only one supervised analysis method covered in the module 20%

• Reflection on methods used for analysis 10%

• Structure presentation, and proper citation of references 10%

SUBMISSION

The report to be submitted should be 2500 words (+/- 10%) excluding the front cover, table of content, list of figure / tables, references and appendices. The document must be in pdf format. All code used for the analysis is also to be submitted, if not submitted the submission will be considered incomplete and the resit will receive a mark of zero; More details will be available on the submission page on MyPlace. Both the code and the report should be submitted using MyPlace; no submission will be accepted in any different way. Assessments submitted after the deadline will receive a mark of zero .

DATASET RULES

Example datasets are available on:

➢ The UCI Machine Learning Repository:https://archive.ics.uci.edu/ml/datasets.php

➢ Kaggle website:https://www.kaggle.com/datasets

You can also select a dataset from other sources, but make sure that the dataset is public and that you have the right to access and analyse the dataset and to share the results.

However, you cannot select a dataset that:

A comes packaged with Scikit-Learn

❖ Boston house-prices dataset

❖ Iris dataset

❖ Diabetes dataset

❖ Digits dataset

❖ Linnerud dataset

❖ Wine dataset

❖ Breast cancer wisconsin dataset

For more information:https://scikit-learn.org/stable/datasets/index.html

A comes packaged with Seaborn

❖ anscombe.csv: Anscombe dataset