What is This?

This app lets you explore how the meaning of individual words in the English language has changed over the past two centuries.

Try it out:

enter the word “file” into the text box at the top of the page. You'll see a diagram with several lines that compare the word “file” to other words. Some lines go up and others go down. Look at the legend on the right. It says that the orange line, for example, measures similarity between the words “file” and “database”. This similarity stays close to zero for the entire 19th century and only goes up recently. So far, that's not too surprising.

But now look at the yellow line. It measures similarity between the words “file” and “wounded”, and it goes down. Where does this come from, and what do the words “file” and “wounded” have in common? To answer this question, click on the yellow line in the diagram and move your mouse towards the year 1800. You'll see a popup that tells you which other words were related to “file” and “wounded”, respectively, in this year. In the column for “file” you'll find words like “gun”, “marines”, and “soldier”. Since English is not my first language I was surprised when I found this, so I searched for an explanation. This way I found out that the word “file” can also refer to a military formation (and, by the way, also to the way schoolchildren are sometimes told to walk). I learned something new today!

Your turn:

click on a different line and move your mouse across the diagram to dive in deeper. Click on a word in the popup or enter a different word in the input box to continue your journey back in time.

A word of warning:

remember that this is just a tool for hypothesis generation. Our language model is far from perfect, so take all diagrams with a grain of salt. Most importantly, please remember that all diagrams are generated automatically by a computer program and do not reflect political opinions of the authors of this app. Nevertheless, if you play with this app for a while you'll likely find something that's interesting. Formulate your findings as a hypothesis and use complementary tools to research it.

What Do You Mean With “Word Similarity”? (and what is a flux capacitor?)

The lines in the above diagram measure a very specific notion of “word similarity” that may be counterintuitive in some cases. Like most of the so-called word embedding models, our model defines word similarity according to the Distributional Hypothesis: two words are considered similar if they can appear in a similar context.

For example, the model considers the words “driving” and “walking” to be similar (and indeed, both are means of transportation) because either word can fill the blank in the sentence “I am ___ to the grocery store”. Similarly, the model also considers opposites such as “small” and “large” to be somewhat similar (indeed, both are adjectives that describe size) because both words can appear in sentences such as “This shirt is too ___ for you”.

By contrast, the model does not care whether two words can appear in context with each other, and it does not use this as a signal for their similarity. This makes sense: for example, the sentence you are currently reading contains the two words “you” and “are” directly next to each other—“you are” is in fact a quite common two-gram in the English language. But the two words are very different: “you” is a pronoun and “are” is a verb, so we can't even compare them.

And, since you asked, a flux capacitor was a critical component of time machines built in the 1980s. The term is used here as a metaphor since this app lets you travel back in time. According to the publication, a flux capacitor only works once it has been accelerated to 88 mph, which is why we've spent a lot of effort on making this app lightning fast.

Who Built This?

Several researchers were involved in the development of this app:

Robert Bamler: natural language model, compression algorithm, and main development of the app;
Yibo Yang: compression algorithm;
Zhouhang Xie: frontend development;
Stephan Mandt: natural language model and compression algorithm.

You can read more about the natural language model and the compression algorithm in the scientific papers listed in the next section below.

How Does This Work? (and how can I cite it?)

This app is a technical demonstration of both a natural language model and a novel compression algorithm for machine learning models. If this app is useful for your research then please consider citing one of the following papers (if you're not sure which one, choose the first one).

The natural language model is a probabilistic time series model that represents each word by a trajectory in a semantic vector space. The model is trained on digitized historic books from the Google Books corpus (about 10 billion words). Details of the model and training procedure are described in the following two papers:

R. Bamler and S. Mandt, Dynamic Word Embeddings. ICML 2017
(see PDF or video presentation)
R. Bamler and S. Mandt, Improving Optimization for Models With Continuous Symmetry Breaking. ICML 2018
(see PDF or related video presentation)

An unusual approach in this app is that all model predictions are calculated on the client side, i.e., the model runs directly inside your browser. This is why predictions appear almost instantly: we don't have to send a request to some server every time you press a key or move your mouse across the diagram. However, the trained model has about 600 million parameters. So how can we run such a large model in your browser without forcing a giant download on you?

The answer lies in a novel compression algorithm for Bayesian machine learning models, which packs the 600 million model parameters into a file that's just 34 MB in size (about 0.54 bits per model parameter). The main idea is to exploit the model's varying degree of certainty about its own parameters. Some model parameters are known only up to some degree of uncertainty, and we therefore encode these parameters with lower accuracy than parameters about which the model is very confident. Such posterior uncertainty estimates can be derived in a principled way using approximate Bayesian machine learning methods. Details of this novel compression algorithm are described in the following paper:

Y. Yang, R. Bamler, and S. Mandt, Variational Bayesian Quantization. ICML 2020
(see PDF or video presentation)

In this web app, a WebAssembly module decompresses parts of the model lazily just when they're needed for some calculation. We discard the decompressed part of the model immediately after the calculation so that we never have to hold the entire uncompressed model in memory. You can find the code on Github.

Could You Add Language X or Feature Y?

Yes, we can ... with your help!

Do you have an interesting data set of historic documents or an idea for a cool new feature? Please reach out!

Don't have a specific idea but looking for an interesting semester project? Please reach out, too. I have several exciting ideas for promising followup projects, for example:

exploring gender bias, racial bias, or sentiment bias over time (see our workshop paper);
adding specialized corpora to analyze how language evolves within a specific community (e.g., scientific communities, courts, or parliaments);
adding more languages besides English (maybe one you speak?).

The Linguistic Flux Capacitor

Explore words related to:

Suggested comparisons:

Add more comparisons: