This course provides an introduction to data analytics for individuals with no prior knowledge of data science or machine learning. The course starts with an extensive review of probability theory as the language of uncertainty, discusses Monte Carlo sampling for uncertainty propagation, covers the basics of supervised (Bayesian generalized linear regression, logistic regression, Gaussian processes, deep neural networks, convolutional neural networks), unsupervised learning (k-means clustering, principal component analysis, Gaussian mixtures) and state space models (Kalman filters). The course also reviews the state-of-the-art in physics-informed deep learning and ends with a discussion of automated Bayesian inference using probabilistic programming (Markov chain Monte Carlo, sequential Monte Carlo, and variational inference). Throughout the course, the instructor follows a probabilistic perspective that highlights the first principles behind the presented methods with the ultimate goal of teaching the student how to create and fit their own models.
Syllabus
Please note: The summer 2022 session of this course will be a condensed 8-week course. The fall 2023 session will be the full 16 weeks.
Section 1: Introduction
Introduction to Predictive Modeling
Section 2: Review of Probability Theory
Basics of Probability Theory
Discrete Random Variables
Continuous Random Variables
Collections of Random Variables
Random Vectors
Section 3: Uncertainty Propagation
Basic Sampling
The Monte Carlo Method for Estimating Expectations
Monte Carlo Estimates of Various Statistics
Quantify Uncertainty in Monte Carlo Estimates
Section 4: Principles of Bayesian Inference
Selecting Prior Information
Analytical Examples of Bayesian Inference
Section 5: Supervised Learning: Linear Regression and Logistic Regression
Linear Regression Via Least Squares
Bayesian Linear Regression
Advanced Topics in Bayesian Linear Regression
Classification
Section 6: Unsupervised Learning
Clustering and Density Estimation
Dimensionality Reduction
Section 7: State-Space Models
State-Space Models – Filtering Basics
State-Space Models – Kalman Filters
Section 8: Gaussian Process Regression
Gaussian Process Regression – Priors on Function Spaces
Gaussian Process Regression – Conditioning on Data
Bayesian Global Optimization
Section 9: Neural Networks
Deep Neural Networks
Deep Neural Networks Continued
Physics-Informed Deep Neural Networks
Section 10: Advanced Methods for Characterizing Posteriors
Sampling Methods
Variational Inference