Hands-on tutorials to kick-start the development and evaluation of information retrieval systems.
In this directory, we provide plenty Jupyter notebooks that guide students through specific aspects of information retrieval experimentation starting from data exploration over simple retrieval approaches to statistical analysis.
The fastest and easiest way to run our tutorials is to use Github Codespaces. Just click on the button below which will start a remote session with everything installed already:
As the tutorials cover contents of different difficulties, we recommend that you start with these basic tutorials:
- The PyTerrier tutorial, in particular Part 1 and Part 2.
- Our "Topics, documents, and relevance judgments". This tutorial will help you with Stages 1 and 2 of our course.
- After getting a basic understanding, you can try out the basic system-focused tutorials on stopword lists, stemming, lemmatization, and query expansion, corresponding to Stage 3.
- Then, exercise statistical analysis (Stage 4).
- Finally, you can then also visit the more research-oriented tutorials (see below) or continue with the PyTerrier Tutorial (e.g., Parts 3 and 4).
Our basic tutorials cover the most important concepts of information retrieval and are broken down based on very simple, easy-to-understand examples. The entry-level tutorials are targeted to Bachelor's (or early Master's) students:
Topic | Jupyter Notebook | Open in Codespaces |
---|---|---|
Topics, documents, and relevance judgments | 🔗 | 💻 |
Stopword lists | 🔗 | 💻 |
Stemming | 🔗 | 💻 |
Lemmatization | 🔗 | 💻 |
Query expansion | 🔗 | 💻 |
Hyperparameter tuning | 🔗 | 💻 |
Statistical analysis | 🔗 | 💻 |
Learning to rank (work in progress) | ⏳ | ⏳ |
Anyhting missing? Propose new tutorial. |
More complex topics that might not be suited to every IR course are still covered in our research-oriented tutorials. These tutorials are often more complex and require more prior knowledge, so they are best suited for Master's students:
Topic | Jupyter Notebook | Open in Codespaces |
---|---|---|
Query expansion with LLMs | 🔗 | 💻 |
Query segmentation | 🔗 | 💻 |
Query performance prediction | 🔗 | 💻 |
Classification of medical/health queries and documents | 🔗 | 💻 |
Entity linking (work in progress) | 🔗 | ⏳ |
Query Intent Prediction (work in progress) | 🔗 | ⏳ |
Query Spelling Correction (work in progress) | 🔗 | ⏳ |
Splade for Query Processing (work in progress) | 🔗 | ⏳ |
Splade for Document Processing (work in progress) | 🔗 | ⏳ |
DocT5Query (work in progress) | 🔗 | ⏳ |
Genre Classification (work in progress) | 🔗 | ⏳ |
Corpus Graph (work in progress) | 🔗 | ⏳ |
Re-ranking with cross-encoders or bi-encoders (work in progress) | ⏳ | ⏳ |
Anyhting missing? Propose new tutorial. |
Of course, this list can never be exhaustive, as paradigms shift and technologies change. However, we are very happy about any contribution from the open science community! For example, you could request a tutorial for a certain topic or submit a pull request where you add it yourself.
This repository and the tutorials within are designed to be run and developed inside Dev containers. Though the easiest way to run Dev containers is to just spin up a GitHub Codespace, you can also run everything on your local machine with Visual Studio Code and Docker (installation instructions). (Some other IDEs might also work.) Even locally, our Dev container allows you to directly start coding without having to install dependencies on your own. To run the tutorials on your machine, follow these steps:
- Install Visual Studio Code and Docker.
- Clone this repository (
git clone
) - Open the cloned directory in Visual Studio Code
- Once asked (VS Code popup), re-open the directory in a Dev container
As another alternative, you could start up a Jupyter server to edit a notebook with Docker (run the command within the cloned directory):
docker run --rm -it -p 8888:8888 --entrypoint jupyter -w /workspace -v ${PWD}:/workspace webis/ir-lab-wise-2023 notebook --allow-root --ip 0.0.0.0
With the plethora of new retrieval approaches emerging every year, it is hard for us alone to update and add all new tutorials. We are grateful for any IR teacher who invests some time to contribute back to our free teaching resources!
To do so, just open this repository in GitHub Codespaces (or clone it and open the repo in a Dev container with your favorite IDE).
Feel free to locally adapt the base image (webis/ir-lab-wise-2023:0.0.3
) to your liking. If you think your changes might be helpful to others as well, please let us know so that we can adjust the public image.
Staff can build and publish the image like this (replace X.Y.Z
with the actual version):
docker build -t webis/ir-lab-wise-2023:X.Y.Z .
docker push webis/ir-lab-wise-2023:X.Y.Z
We would be glad to support you in running our tutorials! Do not hesitate to write us an email or file an issue:
- Maik Fröbe [email protected]
- Harrisen Scells
- Theresa Elstner
- Christopher Akiki
- Lukas Gienapp
- Jan Heinrich Merker [email protected]
- Sean MacAvaney
- Benno Stein
- Matthias Hagen
- Martin Potthast
We're happy to help!
We took inspiration from some great tutorials and resources out there. Of course, our resources should not replace but complement them:
Please refer to the root readme.