This app lets you explore how the meaning of individual words in the English language has changed over the past two centuries.
But now look at the yellow line. It measures similarity between the words “file” and “wounded”, and it goes down. Where does this come from, and what do the words “file” and “wounded” have in common? To answer this question, click on the yellow line in the diagram and move your mouse towards the year 1800. You'll see a popup that tells you which other words were related to “file” and “wounded”, respectively, in this year. In the column for “file” you'll find words like “gun”, “marines”, and “soldier”. Since English is not my first language I was surprised when I found this, so I searched for an explanation. This way I found out that the word “file” can also refer to a military formation (and, by the way, also to the way schoolchildren are sometimes told to walk). I learned something new today!
The lines in the above diagram measure a very specific notion of “word similarity” that may be counterintuitive in some cases. Like most of the so-called word embedding models, our model defines word similarity according to the Distributional Hypothesis: two words are considered similar if they can appear in a similar context.
For example, the model considers the words “driving” and “walking” to be similar (and indeed, both are means of transportation) because either word can fill the blank in the sentence “I am ___ to the grocery store”. Similarly, the model also considers opposites such as “small” and “large” to be somewhat similar (indeed, both are adjectives that describe size) because both words can appear in sentences such as “This shirt is too ___ for you”.
By contrast, the model does not care whether two words can appear in context with each other, and it does not use this as a signal for their similarity. This makes sense: for example, the sentence you are currently reading contains the two words “you” and “are” directly next to each other—“you are” is in fact a quite common two-gram in the English language. But the two words are very different: “you” is a pronoun and “are” is a verb, so we can't even compare them.
And, since you asked, a flux capacitor was a critical component of time machines built in the 1980s. The term is used here as a metaphor since this app lets you travel back in time. According to the publication, a flux capacitor only works once it has been accelerated to 88 mph, which is why we've spent a lot of effort on making this app lightning fast.
Several researchers were involved in the development of this app:
You can read more about the natural language model and the compression algorithm in the scientific papers listed in the next section below.
This app is a technical demonstration of both a natural language model and a novel compression algorithm for machine learning models. If this app is useful for your research then please consider citing one of the following papers (if you're not sure which one, choose the first one).
The natural language model is a probabilistic time series model that represents each word by a trajectory in a semantic vector space. The model is trained on digitized historic books from the Google Books corpus (about 10 billion words). Details of the model and training procedure are described in the following two papers:
An unusual approach in this app is that all model predictions are calculated on the client side, i.e., the model runs directly inside your browser. This is why predictions appear almost instantly: we don't have to send a request to some server every time you press a key or move your mouse across the diagram. However, the trained model has about 600 million parameters. So how can we run such a large model in your browser without forcing a giant download on you?
The answer lies in a novel compression algorithm for Bayesian machine learning models, which packs the 600 million model parameters into a file that's just 34 MB in size (about 0.54 bits per model parameter). The main idea is to exploit the model's varying degree of certainty about its own parameters. Some model parameters are known only up to some degree of uncertainty, and we therefore encode these parameters with lower accuracy than parameters about which the model is very confident. Such posterior uncertainty estimates can be derived in a principled way using approximate Bayesian machine learning methods. Details of this novel compression algorithm are described in the following paper:
In this web app, a WebAssembly module decompresses parts of the model lazily just when they're needed for some calculation. We discard the decompressed part of the model immediately after the calculation so that we never have to hold the entire uncompressed model in memory. You can find the code on Github.
Yes, we can ... with your help!
Do you have an interesting data set of historic documents or an idea for a cool new feature? Please reach out!
Don't have a specific idea but looking for an interesting semester project? Please reach out, too. I have several exciting ideas for promising followup projects, for example: