Download links for lecture notes, problem sets, solutions, and video summaries for each session will be added as we
progress with the course.
Overview and Symbol Codes
What is the connection between data compression, probabilistic models, and error correction?
We answer these questions with some concrete examples of so-called symbol codes.
Theoretical Bounds of Lossless Compression (feat. Entropy)
We prove the Source Coding Theorem.
This cornerstone of information theory quantifies information content, and it states a fundamental lower
bound for the bit rate of lossless compression.
We prove that the famous Huffman Coding algorithm constructs an optimal symbol code.
Then we introduce an information theoretical measure for model mismatch, the “KL-divergence”.
Mutual Information and Taxonomy of Probabilistic Models
How can we build powerful probabilistic models without sacrificing
efficiency?
We'll discuss various designs after introducing important concepts from probability and information theory.
Stream Codes I: Arithmetic Coding And Range Coding
We've proved in Lecture 3 that Huffman Codes are optimal symbol codes.
But it turns out that we can do better than symbol codes—by thinking in fractional bits.
Stream Codes II: Asymmetric Numeral Systems (ANS)
This recently invented stream code is as performant as range coding while being much easier to implement.
But it has a caveat—or is it a feature?
(Yes, that's a clickbait teaser; sue me.)
We generalize the so-called “bits-back trick” from the ANS algorithm to arbitrary latent variable
models.
This allows us to use latent variables for data compression without paying for them.
Think of it as “short selling” bits.
This method for approximate Bayesian inference is a mainstay of modern probabilistic machine learning.
And—curiously—its most natural derivation actually builds on the bits-back coding algorithm.
We extend variational inference and learn both the generative model from training data, and how to quickly
do inference in the learned model.
This results in a popular class of models for neural image and video compression.
Lossy Compression: From VAEs to Rate/Distortion Theory
We implement our first lossy compression method and observe a lower bit rate than with lossless
compression.
We then derive a theoretical bound for the bit rate lossy compression.
Channel Coding Theorem and Source-Channel Separation
We take a step back from compression and consider the wider problem of efficient
communication.
Our discussion also reveals how meaningful the theoretical bound for lossy compression is.
I am excited that Lucas Theis agreed to give a guest lecture.
He is a pioneer of ML-based compression, and at the forefront of novel methods
like diffusion models & universal quantization.