Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Pre-Foundations Data Analytics Assignment

Instructions:

This assignment is provided to provide you practice with R prior to the foundations course in Summer. Please use only R for this assignment. Python deliverables will not be accepted. As a part of your submission, you should submit your R code and results in the form of RMarkdown. No screenshots on Word documents will be accepted.

Alcohol Dataset

Context:

The data was obtained in a survey of students from math and Portuguese language courses in secondary school, who are known to be influenced by alcohol. To study their patterns around academic preferences, family backgrounds, education, and other habits, we wish to explore this dataset and discover relationships between features that would help us gain insights into their behavior and performance. It contains a lot of interesting social, gender, and study information about students. You must merge the two datasets based on similar attributes across the two files, and create a master dataset. This master dataset will then contain all the necessary information to determine key findings about students and their characteristics.

Data Dictionary

EDA:

1. There are several (378) students that belong to both datasets. Use the StudenID column to merge the 2 datasets to get details of the students who took both the subjects. Merge the datasets to combine into a single dataset.

2. What is the distribution of students by school, sex, and age?

3. What is the proportion of students living in urban and rural areas?

4. Is there a relationship between the family size and parents' cohabitation status?

5. What is the distribution of mothers' and fathers' education levels?

6. What are the most common occupations of mothers and fathers?

7. Who are the most common guardians of students?

8. How many students have had past class failures, and what is the distribution of these failures?

9. What is the distribution of school absences?

10. What is the distribution of alcohol consumption and health status?

11. Is there a relationship between students' free time and going out with friends?

12. Are there significant differences in academic performance between male and female students in different age groups, and similarly examine how do these differences vary by school, family background, and study habits?

13. Do students from urban and rural areas have different levels of access to educational resources, and how does this impact their grades?

14. Are there any significant patterns or trends in the relationship between parental education levels, occupations, and income levels, and their children's academic performance and educational attainment?

15. Is there a relationship between students' health status, lifestyle choices, and academic performance?

16. Summarize your findings based on examining the above questions

Tidying Questions:

1. In the current dataset, the column names "G1," "G2," and "G3" refer to the grades of students in Math and Portuguese courses. Derive 2 metric columns (Math Grade & Port Grade) and correspondingly 2 value columns (Math Grade Value, Port Grade Value). Example: Math Grade (contains: G1, G2, G3, G1....) Math Grade Value (contains:  5, 6, 7, 15, .....)

2. Consider the column "travel time" for Math students. The column has levels 1,2,3 and 4. Create a derived set to view the "absences" for each of the 4 levels. In other words. There should be 4 columns, one for each level of Travel Time, comprising of the count of absences. Determine the relationship among the absences counts for these 4 levels"