Course information

In this course, we will systematically cover fundamentals of statistical inference and modeling, with special attention to models and methods that address practical data issues. We will start by reviewing key ideas of parametric estimation and inference. We will provide an introduction to statistical approaches that address various practical issues such as censored data, missing values and time-dependent data. We will then consider various regression settings including the linear regression model, generalized linear models, and nonparametric regression. We will also briefly discuss Bayesian modeling if time allows. Throughout the course, real-data examples will be used in lecture discussion and homework problems. This course lays the statistical foundation for inference and modeling using data, preparing the MS in Data Science students for other courses in machine learning, data mining, and visualization.


Prerequisites

Working knowledge of calculus and linear algebra (vectors and matrices), STAT GR5701 or equivalent, and familiarity with a programming language (e.g., R, Python) for statistical data analysis.


Instructor

Marco Avella Medina

Office: 936 SSW building

Email: [email protected]

Office hours: Friday 3:00-4:00pm (by appointment only)


Teaching assistants

Ian Kinsella

Email: [email protected]

Office hours: TBA


Shun Xu

Email: [email protected]

Office hours: TBA


Course requirements and grading

There will be graded assignments collected periodically. Students are strongly encouraged to also work on the weekly non-graded practice problem sets. There will be 4 quizzes, a midterm covering the first half of the course and a final at the end of the semester covering topics from the entire course. The final grade will be based on 20% assignments, 20% quizzes, 30% midterm and 30% final. The final can account for 60% of the final grade if the score obtained at the final is higher than the score obtained at the midterm.


Academic Integrity

(Adapted from the Faculty Statement on Academic Integrity from https://www.college.columbia.edu/academics/integrity-statement).

As students of this class, you must be responsible for the full citations of others ideas in all of your assignment and projects; you must be scrupulously honest when taking your examinations; you must always submit your own work and not that of another student, scholar, or internet agent. Any breach of this intellectual responsibility is a breach of faith with the rest of our academic community. It undermines our shared intellectual culture, and it cannot be tolerated. Students failing to meet these responsibilities should anticipate being asked to leave Columbia. You will be asked to sign an honor pledge on all homework assignments, quizzes and examinations of this class.

Read more at https://www.college.columbia.edu/academics/academicintegrity.


Disability services

In order to receive disability accommodations, students should first be registered with Disability Services (DS). More information on the DS registration process is available online at http://health.columbia.edu/disability-services. Registered students can contact DS to arrange accommodations for this course, including exam accommodations. Students should bring an accommodation letter for signature to the professor for this course to inform the professor of the types of accommodations they will be needing during the course.


Textbook

Davison, Anthony C., Statistical models. Vol. 11. Cambridge University Press, 2003. ISBN

0-521-77339-3

https://www.cambridge.org/core/books/statistical-models/8EC19F80551F52D4C58FAA2022048FC7

(Free access to Columbia IP addresses)

Lecture Notes (to be posted on courseworks)


Weekly breakdown of topics and readings

  Week
  Topics
  Reading
  1
  Introduction, Estimation (1)
  Ch 1-4
  2
  Estimation (2), confidence intervals, hypothesis testing
  Ch 7
  3
  Exponential family models
  Straight line regression
  Ch 5.1- 5.2
  4
  Survival data
  Multivariate normal
  Ch 5.4
  Ch. 6.3
  5
  Missing data
  Ch 5.5
  6
  Markov chains
  Ch 6.1
  7
  Time series
  Ch 6.4
  8
  Spring break

  9
  Linear regression models (1)
  Ch 8
  10
  Linear regression models (2)
  Ch 8
  11
  Non-linear regression models
  Ch 10.1-10.5
  12
  Nonparametric regression
  Ch 10.7
  13
  Generalized additive models

  14
  Bayesian Models
  Ch 11.1-11.3