CS 210: Data Management for Data Science Fall 2023
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
CS 210: Data Management for Data Science
Fall 2023
Prerequisite: CS 111 (Intro to CS) or CS 142 (Data 101: Data Literacy)
Course details, including a list of teaching assistants (TAs), office hours, and recitations, will be posted on Canvas. There will be no office hours or recitations the first week.
Overview
This course is designed to provide the knowledge and skills needed to acquire and curate real world data, to explore the data to discover patterns and distributions, and to manage large datasets with databases.
You will learn how to use Python libraries to acquire and curate datasets, to get data from various online sources, detect which aspects of data are uncuratedor unreliable and understand why it is so, learn various domain independent and domain dependent ways to curate the data, and transform data into formats that can be explored, managed, analyzed, and visualized. You will also learn how to prepare datasets for loading into a relational database, so you can perform basic analyses using a structured query language (SQL).
Textbook
There is no required textbook for this course, but this may be a useful reference:
. Python for Data Analysis (Wes McKinney, 3rd ed.), https://wesmckinney.com/book/
Grading
Grades will be weighted as follows:
Assignments (5) 40%*
Quizzes (5) 20%*
Midterms (2) 20%*
Final (1) 20%
* Drop lowest grade.
Any regrading request must be raised within one week of grades being returned, after which they are considered final.
There is no extra credit in the course.
Exam dates
. Midterm 1: 10/13
. Midterm 2: 11/17
. Final: 12/15, 8 – 11 am
All exams are comprehensive, in that they may include any material from the course up to that point.
Assignments
All assignments are to be done individually. You can resubmit your homework any number of times before the deadline. Grading will be based on the last submission. You may submit homework up to 2 days late, with a penalty of 15 points for up to 1 day late, 25 points for up to 2 days late. Homework more than 2 days late will not be accepted.
Homework must be submitted on Canvas or CodePost, as applicable; emailed submissions are not ac- cepted. You are responsible for ensuring the submitted files are correct.
Tentative topics
. Python programming
. Numpy
. Pandas
. CSV and JSON formats
. Data cleaning
. Regular expressions
. Visualization with matplotlib
. Relational databases and SQL
Academic integrity
Rutgers University takes academic dishonesty very seriously. By enrolling in this course, you assume re- sponsibility for familiarizing yourself with the Academic Integrity Policy and the possible penalties (in- cluding suspension and expulsion) for violating the policy. As per the policy, all suspected violations will be reported to the Office of Student Conduct. Please review the Academic Integrity Policy at: http: //nbacademicintegrity.rutgers.edu/ .
Accommodations
Rutgers University welcomes students with disabilities into all of the University’s educational programs. In order to receive consideration for reasonable accommodations, a student with a disability must contact the appropriate disability services office at the campus where you are officially enrolled, participate in an intake in- terview, and provide documentation: https://ods.rutgers.edu/students/documentation-guidelines .
If the documentation supports your request for reasonable accommodations, your campus’s disability services office will provide you with a Letter of Accommodations. Please share this letter with your instructors and discuss the accommodations with them as early in your courses as possible.
To begin this process, please complete the registration form (https://webapps.rutgers.edu/student-ods/ forms/registration).
2023-09-28