DS-UA 112 Introduction to Data Science Summer Semester 2021
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Introduction to Data Science (DS-UA 112)
Summer Semester 2021
§0.0 Purpose and design : This is a survey course. It has been designed to achieve several specific goals. First, it is supposed to introduce you to foundational concepts in the field of data science. Second, we aim to impart the 21st century version of a liberal arts education. The 3 classical Rs of Reading, wRiting and aRithmetic are now joined by 2 new ones: data liteRacy and pRogramming. In this class, we assume that you are already somewhat familiar with the first 3 Rs and focus on the latter 2. Third, we aim to plant a variety of seeds about topics that you will encounter again in more advanced classes. Fourth, we hope to kindle a passion for data and data analysis that will last a lifetime. Finally, we also intend to impart several general purpose skills (e.g. coding in Python, the QDAFI method, etc.). Overall, the class is dedicated to the philosophy of computational empowerment. We live in transformational times. We believe that this mindset as well as these concepts are essential to a flourishing existence in the 21st century and beyond.
§1.0 Instructor: |
Pascal Wallisch, PhD |
Office: |
6 Washington Place (Meyer Hall), Room 402 |
Phone: |
(212) 998-8430 |
Email: |
|
Office hours: |
Friday 11.00 pm - 1.00 am (REMOTE:https://nyu.zoom.us/j/303123378) |
§1. 1 TAs: All TA office hours are by appointment (via Calendly link). Numbers are zoom room IDs.
Prerna Mishra (Calendly link) Zoom:97015607439 |
Hörmet Yiltiz (Calendly link) Zoom:98762396456 |
Stephen Spivack (Calendly link) Zoom:9882770151 |
Sarah Espinosa (Calendly link) Zoom:4209794454 |
TA email:intro2dsnyu@gmail.com
§1.2 Session times: Mo, Tu, & We 3:00 - 5:10 pm
§1.3 Session space: Remote inhttps://nyu.zoom.us/j/98108611854
§1.4 Session content: There are 3 sessions introducing new content (both concepts and code) each week. Please attend these remotely via zoom. If these times don’t work for you (e.g. because you live in another time zone), you can also just watch the recordings, but live is more lively, so join us if you can. Sometimes, we will have guest lecturers who will advise on professional development.
§1.5 Quora (Open forums): Code questions – Thursday at 9 pm in97015607439 Non-code questions – Friday at 9 pm in4209794454 Anything goes – Saturday at 9 pm in98762396456
§1.6 Prerequisites : DS4E or equivalent
§1.7 Scope: 0.01 to 1. Language of instruction is Python, we index from 0.
§1.8 Materials:
Concepts: “Data Science from Scratch: First Principles with Python”, by Joel Grus Linear Algebra: “Linear Algebra: Theory, Intuition, Code”, by Mike X Cohen Coding: “Neural Data Science”, by Nylen and Wallisch
§1.9 Assignments: Are designed to foster and encourage conceptual proficiency. There is one problem set, one QDAFI response paper and one function due per theme block.
§ 2.0 Course grading : The total grade is calculated
A) After action assessments (participation)
B) 1 Big data analysis project
C) 1 Course logistics quiz
E) 1 Exam (cumulative)
F) 6 Functions (Python)
G) Grace (Elysium, Fuggerei)
P) 6 Problem sets
Q) 6 QDAFI response papers
S) 1 Intake survey
X) 1 Exit survey
as follows:
1% / lecture
16%
1%
20%
2% / function
10%
2% / set
2% / paper
1%
1%
15% total
16% total
01% total
20% total
12% total
10% total
12% total
12% total
01% total
01% total
Total
§ 2. 1 Grade cutoffs :
100%
A |
95-100 |
B+ |
87-89.9 |
C+ |
77-79.9 |
D+ |
65-69.9 |
F |
30-59.9 |
- |
90-94.9 |
B B- |
83-86.9 80-82.9 |
C C- |
73-76.9 70-72.9 |
D |
60-64.9 |
I |
0-29.9 |
§ 2.2 Extra credit opportunities:
There are several extra credit opportunities in the class.
1. Problem sets: Students are expected to do 6 quizzes for full credit. Students can do an additional quiz for extra credit (that will replace the lowest score received).
2. Response papers: Students are required to do 6 papers for full credit. They can do a 7th as extra credit, which will replace the lowest paper grade.
3. Functions: Students need to write 6 functions for full credit. They can do a 7th one for extra credit, which will replace the lowest function grade.
4. MIG (Meme or Infographic): Make a meme or infographic of a course concept (e.g. PCA) for an extra 1% grade score.
5. WOW (What one wonders): Write about an interesting issue or problem that you wonder about, which might lend itself to be addressed or resolved by a data-based approach
§2.3 Attendance and Participation: You are responsible for the material covered in this course. Thus, consistent attendance is critical, as the exam will focus on the material discussed during lecture and labs will be crucial to clarify the subject material. Also, we assign a participation grade, which counts as 15% of the total class grade, at a rate of 1% per lecture.
So you need to attend a minimum of 15 lectures (out of 17) to get a full participation score. § 2.4 Workload: You should expect to spend about 15 hours total per week on this class – 6.5 in lecture and lab, ~1 in office hours, ~6 doing the weekly assignments and ~1.5 doing the readings.
That’s a lot, but not unreasonable. Remember that you are going to learn many new statistical, computational and coding concepts in this class. There are no shortcuts. Immersion is key. This course is designed akin to developing an atomic bomb – a necessarily large investment of time and resources, but with a potentially high yield and the transformational prospect of changing everything forever. This goes in particular for the summer version of this class.
§ 2.5 Theme blocks: The class material is grouped into 6 major theme blocks: I: Theoretical foundations, II: Characterizing data, III: Predictions from data, IV: Inferences from data, V: Enhanced hypothesis testing - beyond p, VI: Machine learning. As you can see, this is an introductory survey class that serves as a foundation for more advanced classes . Should you already know about a particular topic, please understand that it is unlikely that this is the case for everyone. This means, we still have to cover all of these topics, as we need to onboard everyone.
§ 3.0 COURSE SCHEDULE
Week/Block |
Monday |
Tuesday |
Wednesday |
I: 07/05-07/09 Foundations |
Independence Day NO CLASS |
a: Welcome b: Probability I |
a: Probability II b: Lab I |
II: 07/12-07/16 Characterization |
a: Linear Algebra I b: Linear Algebra II |
a: Central Tendency b: Dispersion |
a: Lab II b: Correlation |
III: 07/19-07/23 Prediction |
a: Linear Regression b: Lab III |
a: Control b: Multiple regression |
a: Model design b: Lab IV |
IV: 07/26-07/30 Inference |
a: Sample & Population b: NHST |
a: Parametric tests I b: Parametric tests II |
a: Nonparametric tests b: Lab V |
V: 08/02-08/06 Beyond p |
a: Resampling methods b: Effect size & Power |
a: Lab VI b: Bayes I |
a: Bayes II b: Lab VII |
VI: 08/09-08/13 Machine learning |
a: Logistic Regression b: PCA |
a: Lab VIII b: Clustering & Classification |
a: Lab IX b: Grand finale |
B: Big data analysis project due date: August 19th
E: Examination (remote take home): Released August 18th, due August 20th
2022-07-12