Data Compression With and Without Deep Probabilistic Models

Overview and Symbol Codes

What is the connection between data compression, probabilistic models, and error correction? We answer these questions with some concrete examples of so-called symbol codes.

1

Lecture

21 April

Tutorial

22. & 29.

Notes Problems Set 0 Problems Set 1 Solutions to Problems Set 0
Solutions to Problems Set 1

Theoretical Bounds of Lossless Compression (feat. Entropy)

We prove the Source Coding Theorem. This cornerstone of information theory quantifies information content, and it states a fundamental lower bound for the bit rate of lossless compression.

2

Lecture

28 April

Tutorial

6 May

Lecture Notes Problem Set Solutions

Proof of Optimality of Huffman Coding

We prove that the famous Huffman Coding algorithm constructs an optimal symbol code. Then we introduce an information theoretical measure for model mismatch, the “KL-divergence”.

3

Lecture

5 May

Tutorial

13 May

Lecture Notes Problem Set Solutions

Mutual Information and Taxonomy of Probabilistic Models

How can we build powerful probabilistic models without sacrificing efficiency? We'll discuss various designs after introducing important concepts from probability and information theory.

4

Lecture

12 May

Tutorial

20 May

Lecture Notes Problem Set Solutions

Stream Codes I: Arithmetic Coding And Range Coding

We've proved that Huffman Codes are optimal symbol codes. But it turns out that we can do better than symbol codes—by thinking in fractional bits.

5

Lecture

19 May

Tutorial

27 May

Lecture Notes Problem Set Solutions

Stream Codes II: Asymmetric Numeral Systems (ANS)

This recently invented stream code is as performant as range coding while being much easier to implement. But it has a caveat—or is it a feature?
(Yes, that's a clickbait teaser; sue me.)

6

Lecture

2 June

Tutorial

17 June

Lecture Notes Problem Set Solutions

Bits-Back Coding

We generalize the so-called “bits-back trick” from the ANS algorithm to arbitrary latent variable models. This allows us to use latent variables for data compression without paying for them. Think of it as “short selling” bits.

7

Lecture

3 June (!)

Tutorial

24 June

Lecture Notes Problem Set Solutions

Variational Inference

This method for approximate Bayesian inference is a mainstay of modern probabilistic machine learning. And—curiously—its most natural derivation actually builds on the bits-back coding algorithm.

8

Lecture

23 June

Tutorial

1 July

Lecture Notes Problem Set Solutions

Variational Autoencoders and Lossy Neural Compression

We extend variational inference and learn both the generative model and how to do inference in it from training data. This results in a popular class of models for lossy data compression.

9

Lecture

30 June

Tutorial

8 July

Lecture Notes Problem Set Solutions

Channel Coding Theorem and Source-Channel Separation

We take a step back from compression and consider the wider problem of efficient communication. This will come in handy for lossy compression too.

10

Lecture

7 July

Tutorial

15 July

Lecture Notes Problem Set Solutions

Theoretical Bounds of
Lossy Compression
(Rate/Distortion Theory)

Lossy compression methods can achieve lower bit rates than lossless methods. But they have a limit too— which turns out to be an old friend.

11

Lecture

14 July

Tutorial

22 July

Lecture Notes Problem Set Solutions

Recent Advances in Machine-Learning Based Data Compression

Flows, quantization methods, probabilistic models for videos, ... — modern compression research is a vibrant field.

12

Lecture

21 July

Tutorial

29 July

Lecture Notes Problem Set Solutions

Data Compression With and Without Deep Probabilistic Models
(Summer Term 2022)

At a Glance

Links

Additional Resources

Tentative Schedule & Course Materials

Overview and Symbol Codes

Theoretical Bounds of Lossless Compression (feat. Entropy)

Proof of Optimality of Huffman Coding

Mutual Information and Taxonomy of Probabilistic Models

Stream Codes I: Arithmetic Coding And Range Coding

Stream Codes II: Asymmetric Numeral Systems (ANS)

Bits-Back Coding

Variational Inference

Variational Autoencoders and Lossy Neural Compression

Channel Coding Theorem and Source-Channel Separation

Theoretical Bounds of
Lossy Compression
(Rate/Distortion Theory)

Recent Advances in Machine-Learning Based Data Compression

Cancelled

Data Compression With and Without Deep Probabilistic Models(Summer Term 2022)

At a Glance

Links

Additional Resources

Tentative Schedule & Course Materials

Overview and Symbol Codes

Theoretical Bounds of Lossless Compression (feat. Entropy)

Proof of Optimality of Huffman Coding

Mutual Information and Tax­on­o­my of Probabilistic Models

Stream Codes I: Arithmetic Coding And Range Coding

Stream Codes II: Asym­met­ric Numeral Systems (ANS)

Bits-Back Coding

Variational Inference

Variational Autoencoders and Lossy Neural Compression

Channel Coding Theorem and Source-Channel Separation

Theoretical Bounds ofLossy Compression(Rate/Distortion Theory)

Recent Advances in Machine-Learning Based Data Compression

Cancelled

Data Compression With and Without Deep Probabilistic Models
(Summer Term 2022)

Mutual Information and Taxonomy of Probabilistic Models

Stream Codes II: Asymmetric Numeral Systems (ANS)

Theoretical Bounds of
Lossy Compression
(Rate/Distortion Theory)