Data Science Courses

Data Science Courses

Applied Computation 209a - Data Science 1: Introduction to Data Science
Pavlos Protopapas, Lecturer, Kevin Rader, Preceptor, Margo Levine, Lecturer. Fall TermMon/Wed 1:00 - 2:30 PM.

Data Science 1 is the first half of a one-year introduction to data science. The course will focus on the analysis of messy, real life data to perform predictions using statistical and machine learning methods. Material covered will integrate the five key facets of an investigation using data: (1) data collection - data wrangling, cleaning, and sampling to get a suitable data set;  (2) data management - accessing data quickly and reliably; (3) exploratory data analysis – generating hypotheses and building intuition; (4) prediction or statistical learning; and (5) communication – summarizing results through visualization, stories, and interpretable summaries. Note: Only one of CS 109a, AC 209a, or Stat 121a can be taken for credit. Students who have previously taken CS 109, AC 209, or Stat 121 cannot take CS 109a, AC 209a, or Stat 121a for credit. Programming knowledge at the level of CS 50 or above, and statistics knowledge at the level of Stat 100 or above (Stat 110 recommended).

Applied Computation 209b - Data Science 2: Advanced Topics in Data Science
Mark Glickman, Sr. Lecturer and Rachel Schutt, Lecturer. Spring TermMon/Wed 1:00 - 2:30 PM.

Data Science 2 is the second half of a one-year introduction to data science. Building upon the material in Data Science 1, the course introduces advanced methods for data wrangling, data visualization, and statistical modeling and prediction. Topics include big data and database management, interactive visualizations, nonlinear statistical models, and deep learning. Note: Can only be taken after successful completion of CS 109a, AC 209a, Stat 121a, or equivalent. Students who have previously taken CS 109, AC 209, or Stat 121 cannot take CS 109b, AC 209b, or Stat 121b for credit. CS 109a, AC 209a, or Stat 121a required.

Applied Math 207 - Advanced Scientific Computing: Stochastic Methods for Data Analysis, Inference and Optimization
Rahul Dave, Lecturer. Spring Term, Tues/Thurs 11:30 AM - 1:00 PM.

Develops skills for computational research with focus on stochastic approaches, emphasizing implementation and examples. Stochastic methods make it feasible to tackle very diverse problems when the solution space is too large to explore systematically, or when microscopic rules are known, but not the macroscopic behavior of a complex system. Methods will be illustrated with examples from a wide variety of fields, like biology, finance, and physics.

Computer Science 207 - Systems Development for Computational Science 
David Sondak, Lecturer. Fall Term, Mon/Wed 11:30 AM - 1:00 PM.

This is a project-based course emphasizing designing, building, testing, maintaining and modifying software for scientific computing. Students will work in groups on a number of projects, ranging from small data-transformation utilities to large-scale systems. Students will learn to use a variety of tools and languages, as well as various techniques for organizing teams. Most important, students will learn to fit tools and approaches to the problem being solved.

Critical Thinking in Data Science
Rachel Schutt, Lecturer.  Spring Term, Mon/Wed 2:30 - 4:00 PM.

This course examines the wide-ranging impact data science has on the world and how to think critically about issues of fairness, privacy, ethics, and bias while building algorithms and predictive models that get deployed in the form of products, policy and scientific research. Topics will include algorithmic accountability and discriminatory algorithms, black box algorithms, data privacy and security, ethical frameworks; and experimental and product design. We will work through case studies in a variety of contexts including media, tech and sharing economy platforms; medicine and public health; data science for social good, and politics. We will look at the underlying machine learning algorithms, statistical models, code and data. Threads of history, philosophy, business models and strategy; and regulatory and policy issues will be woven throughout the course.

Applied Computation 297r - Computational Science and Engineering Capstone Project 
Pavlos Protopapas, Lecturer. Spring Term, Tues 4:00 - 6:00 PM.

The CSE capstone project is intended to integrate and apply the skills and ideas CSE students acquire in their core courses and electives. By requiring students to complete a substantial and challenging collaborative project, the capstone course will prepare students for the professional world and ensure that they are trained to conduct research. There will be no homework or lectures. Students will be dealing with real-world problems, messy data sets, and the chance to work on an end-to-end solution to a problem using computational methods.

Statistics & Computer Science Electives List

The Data Science Program Committee has approved the following courses for inclusion as STAT and CS electives in SM plans of study. This list is not meant to limit students' elective choices. In particular, students who have taken a course listed here (or its equivalent) and wish deeper exploration are encouraged to propose more advanced courses.  Note: Many, but not all, courses are offered every year.


Approved CSE Applied Math electives
STAT 123 Quantitative Finance Spring
STAT 131 Time Series & Prediction Fall
STAT 139 Linear Models Fall
STAT 140 Design of Experiments Fall
STAT 149 Genereralized Linear Models Spring
STAT 210 Probability I Fall
STAT 211 Statistical Inference I Fall
STAT 212 Probability II Spring
STAT 213 Statistical Inference II Spring
STAT 220 Bayesian Data Analysis                                         Spring
STAT 234 Sequential Decision Making Spring


Approved Computer Science electives  
CS 124 Data Structures and Algorithms Spring
CS 165 Data Systems Fall
CS 171 Visualization Fall
CS 181 Machine Learning Spring
CS 182 Artificial Intelligence Fall
CS 205 Computing Foundations for Computational Science Spring
CS 262 Introduction to Distributed Computing Spring
CS 265 Big Data Systems Spring
CS 281 Advanced Machine Learning Fall
CS 282r Topics in Machine Learning Fall
CS 287 Machine Learning for Natural Language Spring