L1006 Data Coding & Visualisation
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
L1006 Data Coding & Visualisation
Project
Project assignment
For this project, you are expected to carry out data analysis using Python and write a report that summarises the results ofyour data analysis.
The topic chosen for this project is “The Determinants of Students’ Learning” and the dataset provided is derived from the OECD's Programme for International Student Assessment (PISA) which measures 15-year-olds’ ability to use their reading, mathematics and science knowledge and skills to meet real-life challenges.
The file “PISA-2018-UK” contains data on students’ scores on Maths, Reading, and Science tests, as well as additional information on students’ individual characteristics, family background, and school characteristics. More detailed information on data and variables can be found in “PISA-DataDescription”.
Your task is to outline a research question of interest and perform data analysis to support your hypotheses and findings. Possible research questions are
• Does a pupil’s immigrant status affect their learning outcomes?
• Do private school pupils outperform public school pupils in Maths, Reading, or Science tests?
• Does access to computers and other educational resources at home affects pupils’ learning outcomes?
You may choose one of the questions above or propose a new one ofyour choice.
Your data analysis should use the numerical and visualisation tools covered in lectures and workshops. You are encouraged to explore and use additional tools and methods as long as appropriate to the research question.
Submission details
You are required to submit two files:
• Report (2000 words, Word docx or pdf) outlining the research question, and summarising and commenting on the results of your data analysis. This should include any tables, graphs, equations used, and references.
(2000 words count includes main text, but excludes tables, graph, equations, references, appendix)
• Jupyter Notebook (.ipynb) containing the codes used to produce descriptive statistics, tables, graphs, and any other output ofyour data analysis.
Report structure
The suggested structure for the report is as follows
1. Introduction
Clearly state the research question and give an outline of the report and data analysis performed. If relevant, briefly provide additional context on why the research question is relevant and related existing studies.
2. Descriptive statistics
Introduce the variables used in your analysis and explain why these are relevant to the research question. Descriptive statistics should be presented via tables and/or graphs and commented on.
3. Data analysis and results
This is the main section of the report. The aim is to construct evidence from data and to answer the research question. A variety of numerical and visualisation tools and methods may be used here, including tables, graphs, hypotheses testing, and linear models.
Use your judgement to select the most appropriate methods here. This should not be an exercise where you mechanically apply all methods, but rather you should think about what tools and results are most convincing.
4. Conclusions
Critically summarise the results ofyour analysis.
Please notice that the structure above is only a suggestion, and departures from it are allowed. For instance, you may decide to group sections 2 and 3 and/or add/rename sections ifyou think your analysis could be better described otherwise.
Data analysis plan
The questions and tips below may help you organise your work and give you some ideas for your data analysis.
• Have you identified the main variables of your analysis? Which other variables may be also important for your research question?
• Which ofyour variables are numerical? Which are categorical?
• Is this variable best described using a numerical or a visualisation tool? Or both?
• Are there any significant differences between different groups/categories?
• Is there an association among these variables? What is the best way to show this?
• What factors are likely to contribute the most to the main variables of interest?
Marking guide
This project accounts for 50% ofthe final mark for this module.
Both report (Word or pdf) and codes (Jupyter notebook) will be considered in the marking ofthis project.
Learning outcomes assessed include
• Ability to identify the appropriate visualisation and numerical tools given the research question chosen and dataset provided
• Ability to use the visualisation and numerical tools correctly
- Graphs and tables should be self-explanatory with clear titles, labels, etc.
• Ability to correctly interpret and comment on the output generated by the visualisation and numerical tools of choice
- For instance, a correct interpretation of graphs and numerical values
• Ability to use and work with the given dataset in order to uncover meaningful patterns in the data
• Correct, well-organised, and clear exposition of results in the report For the codes (Jupyter notebook), the following criteria will be considered
• Jupyter notebook should be well-organised with correct use of both text (Markdown) cells and code cells
• Code should be readable and well commented
- Variable names should be informative
- Lines/portions of code should be accompanied by brief comments that explains what they do (unless it is obvious)
• Code should avoid repetitions where possible
• Code should be maintainable, i.e. added features or bugs fixing can be achieved with minimal modification to the code
• Demonstration of good use of Python libraries (e.g. Pandas, Statsmodels, Matplotlib)
2022-03-25