Lectures: Thursdays 12:15-13:45 in room TTR2 (Maria-von-Linden-Straße 6, ground floor)
Tutorials: Fridays 12:15-13:45 in room A302 (Sand 1).
6 ECTS
I also stream lectures and tutorials on Zoom (see link on moodle) but I highly encourage students to attend the in-person sessions in class instead.
To encourage interactivity, Zoom sessions are not recorded.
Details of the following schedule are still subject to change.
Overview and Symbol Codes
What is the connection between data compression, probabilistic models, and error correction?
We answer these questions with some concrete examples of so-called symbol codes.
Theoretical Bounds of Lossless Compression (feat. Entropy)
We prove the Source Coding Theorem.
This cornerstone of information theory quantifies information content, and it states a fundamental lower bound for the bit rate of lossless compression.
We prove that the famous Huffman Coding algorithm constructs an optimal symbol code.
Then we introduce an information theoretical measure for model mismatch, the “KL-divergence”.
Mutual Information and Taxonomy of Probabilistic Models
How can we build powerful probabilistic models without sacrificing efficiency?
We'll discuss various designs after introducing important concepts from probability and information theory.
Stream Codes II: Asymmetric Numeral Systems (ANS)
This recently invented stream code is as performant as range coding while being much easier to implement.
But it has a caveat—or is it a feature?
(Yes, that's a clickbait teaser; sue me.)
We generalize the so-called “bits-back trick” from the ANS algorithm to arbitrary latent variable models.
This allows us to use latent variables for data compression without paying for them.
Think of it as “short selling” bits.
This method for approximate Bayesian inference is a mainstay of modern probabilistic machine learning.
And—curiously—its most natural derivation actually builds on the bits-back coding algorithm.
Variational Autoencoders and Lossy Neural Compression
We extend variational inference and learn both the generative model and how to do inference in it from training data.
This results in a popular class of models for lossy data compression.