Explore how a word changes over time
Type in a word, and Word Lapse will show you how its associated words and frequency of use has changed over the years, and some other interesting details. This information was generated by training multiple machine-learning models on 30+ million documents from Pubtator Central and on 160+ thousand preprints from bioRxiv and medRxiv.
Specifically, we used the Word2Vec natural-language-processing (NLP) technique, which represents words as dense (300 dimensional) vectors. This model constructs these vectors by training a shallow neural network to accomplish the NLP task of predicting a word given their neighboring words. Once the network has finished this task, these vectors contain information that allows a network to discern one word from the next and allows us to perform downstream tasks such as changepoint detection.
For more technical information about our approach and how we generated this data, see this paper.
The API for this application can be used directly at https://api-wl.greenelab.com/
.
Everything in this repo -- including the code, data, submodules, and app -- is licensed under BSD-3. See the license file
To separate concerns and to make cloning and developing this repo easier, the model data (~26+ GB) for this project is stored in a separate submodule repo.
See SUBMODULES.md
for more information.
The backend for this app (under /server
) consists of three components:
- a RESTful API implemented in FastAPI
- a Redis in-memory cache with writethrough to disk
- a set of RQ workers that process word statistic lookups
The front-facing app (under /app
) is made with React, bootstrapped with create-react-app
.