Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMP7230 – Introduction to Programming for Data Scientists

Assignment 1

2022

Overview and Objectives

The main idea of this assignment is to showcase how we can use Python for basic data manipulation, storage and analysis. In this assignment you will be writing some short pieces of code to process and display data related to urban statistics in the Australian Capital Territory. The original data set was obtained from the ACT Population Projections by Suburb (2015 - 2020), the ACT School Location dataset, and the Census Data for all ACT Schools.

Important

• Make sure that your student ID is included as a comment at the start of your submission.

• Do NOT include your name anywhere in your submission. All marking will be done anonymously.

• Submit one Python file only, named COMP7230_Assignment_ 1_Submission .py.

• Make sure you submit a final version of the assignment before the submission deadline.

Submission

Submission will be done using Wattle. Click on the link Assignment 1 submission (https://wattlecourses.anu. edu.au/mod/assign/view.php?id=2572722) to upload your file.  You may submit as many draft versions of the assignment as you wish.  However, you must make sure you submit a final version before the due date.  We will mark the final version present at the due date, or the first one submitted following the due date, with penalties in accordance with the late submission policy (see below). This means that if you intend to submit late, do not submit an early final version, since we will assume it is your actual submission and it will be marked accordingly.

Deadlines, Extensions and Late Submissions

The assignment is due by 16:00 on Friday, September 9, 2022.

Students will only be granted an extension on the submission deadline in extenuating circumstances, as defined byofficial ANU policy. If you think you have grounds for an extension, you should notify the course convener as soon as possible and provide written evidence in support of your case (such as a medical certificate). The course convener will then decide whether to grant an extension and inform you as soon as practical. In accordance with the ANU late submission policy, except where an extension has been approved by the course convener, late submissions will be penalised by 5% of the total marks for the assignment for each business day or part thereof late, up to a maximum of 10 business days, after which you will receive a mark of 0 (zero).

Please also note that if your submission is not received on time, we may be unable to give you feedback prior to the submission deadline of Assignment 2.

Plagiarism

No group work is permitted for the assignment. We do encourage you to discuss your work, but we expect you to do the assignment work by yourself. If you are unsure about what constitutes plagiarism, please read through the ANU Academic Honesty Policy.

If you do include ideas or material from other sources, then you clearly have to make attribution. For example, by providing a reference to the material or source as a comment in your code. We do not require a specific referencing format, as long as you are consistent and your references allow us to find the source, should we need to while we are marking your assignment.

Once marks are released, you will have two weeks in which to question your mark. After this period has elapsed, your mark will be considered final and no further changes will be made. If you ask for a re-mark, your assignment will be re-marked entirely, and your mark may go UP or DOWN as a result, or remain the same.

Assignment Structure

The first assignment consists of three Python files:

• COMP7230_Assignment_ 1_Submission .py

• COMP7230_Assignment_ 1_Test .py

• COMP7230_Assignment_ 1_Visualiser .py and the following data files:

• ACT_Population_Projections_by_Suburb__2015_ -_2020_ .csv

• ACT_School_Locations_2017_ -_archived .csv

• Census_Data_for_all_ACT_Schools .csv

You should download all these files to the same location, before starting to answer the assignment questions.

Assignment Tasks

You only need to modify and submit COMP7230_Assignment_ 1_Submission .py. Your task consists of implementing functions in COMP7230_Assignment_ 1_Submission .py. The specifications for each of these functions are included as comments in COMP7230_Assignment_ 1_Submission .py. The function parameters and return types are all listed inside the docstrings for the functions you need to write.  Do not modify the function signatures (such as by renaming the input parameters, or adding extra ones). You also should not add any code outside the seven functions provided, except optionally, import statements, any additional functions for your own internal use, and those noted as bonus tasks. You should include comments in your code.

In addition to the file COMP7230_Assignment_ 1_Submission .py, we have also provided a suite of unit tests, COMP7230_Assignment_ 1_Test.py, which will help you to test your work. These tests work in an identical fashion to the examples we use in the labs, so please familiarise yourself with those if you are not sure how to make use of them. Please note that these tests are there to assist you, but passing the tests is NOT a guarantee that your solution is correct.

• Each task completion can be tested by running the testing file. It can be done either on the command line:

%  pytest  COMP7230_Assignment_ 1_Test .py

or using the PyCharm IDE (if needed, we will demonstrate this to you either in the labs, or at the lectures on “Tools”). First, the result of the test run will be“15 failed, 3 passed”, but as you gradually implement functions  in the tasks, more tests will pass, with the final goal to have them all passed at the end.

• Once you have completed all tasks 1– 7, you should be able to run COMP7230_Assignment_ 1_Visualiser .py and produce the visualisation of population versus enrolment distributions across all suburbs. This plot should resemble the one in Fig. 1.

Bonus tasks are also detailed in COMP7230_Assignment_ 1_Submission .py as tasks 8– 10. Additionally, you will be working with real data, and as such, it suffers from a number of data quality issues.  You are encouraged to spend some time exploring the data to better understand the limitations imposed by the prescribed methodology. You should feel free to consider and incorporate any additional data preparation, cleaning, or manipulation, which would improve the end result. Please note and justify any such choices within your code documentation.

Marking

The assignment will be marked out of 20 and count for 20% of your final grade for COMP7230.  The correctness of your solution for each of the 7 tasks will contribute 14 marks.  The distribution of these marks are detailed in COMP7230_Assignment_ 1_Submission.py. Please note that not all tasks are equally weighted, and the suite of tests we will use during marking is more extensive than that provided in COMP7230_Assignment_ 1_Test .py. Partial marks may be awarded, even for solutions that do not pass all the tests, provided the code is making progress towards a correct solution.

In addition to the 14 marks for correctness, 6 marks are allocated to code quality. Up to 2 bonus marks may also be awarded for completion of the bonus tasks and any improvements on the prescribed methodology. Please note that your overall mark may not exceed the assignment total of 20 marks.

You may refer to the documentation within COMP7230_Assignment_ 1_Submission .py for further details on the marking breakdown.

 

Figure 1: Each“bubble”represents a suburb characterised by two numbers — the population in the schooling range, and the enrolment to all schools which are located in the suburb. The bubble colour indicates the difference between the two numbers (deficit or surplus of school enrolments over population in the corresponding age range).  The bubble size shows the extent of this difference.

Compatibility

• The version of Python we have adopted for this course is 3.9.12. Please ensure your code is compatible with Python 3.9. The offical documentation for Pythonis an excellent resource, and allows you to select the relevant version.

• In completing the tasks, you are encouraged to use only thePython Standard Library, and you may use anything from the standard library.

• The data preparation, cleaning and manipulation can be completed using only the standard built-in types and their methods. You may, however, wish to explore additional modules from the standard library. For example, the datetime module for date parsing, or the re module for regular expression text parsing.

• If you are already comfortable with standard Python, then you may optionally use the numpy and/orpandas packages.  These packages are more purpose built for scientific and data science applications, but you are cautioned against using them without first developing some confidence with standard Python.

• No packages other than the aforementioned may be used.