Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMP47670

Data Science in Python

2023 Autumn Practical Test


Overview

The notebook Aut2023Test_Core.ipynb contains code to load the file sheep.csv that contains data on 300 sheep. There are three breeds from three locations and the data has been gathered over three years. In addition to the location, breed and year information there is also data on the fleece weight and overall weight of each sheep.

Tasks

1.   Provide summary statistics of the numeric features and counts for the ‘breed’ feature. (10 marks)

2.   The column fleece_w gives the fleece weight in kilograms. Add a new column fleece_g that gives the weight in grams.   (10 marks)

3.   Produce a bar-chart showing the mean fleece weight in kg of the three species. (10 marks)

4.  What breeds are found in just one location? (15 marks)

5.   Produce a bar-chart (grouped bar-chart) showing counts of the different breeds across the three years. (15 marks)

6.   Develop and test a regression model to estimate fleece weight from body weight.

Test using hold-out testing. For testing you can use the  .score method on the model; this has the form <model>.score (X_test, y_test). (20 marks)

7.   Repeat the exercise in Task 6 (above) but just for the Blackface sheep. Which model is more accurate? Why might this be the case? (20 marks)