Active Learning for Text Classification in Python.
Installation | Quick Start | Contribution | Changelog | Docs
Small-Text provides state-of-the-art Active Learning for Text Classification. Several pre-implemented Query Strategies, Initialization Strategies, and Stopping Critera are provided, which can be easily mixed and matched to build active learning experiments or applications.
What is Active Learning?
Active Learning allows you to efficiently label training data in a small data scenario.
- Provides unified interfaces for Active Learning so that you can easily mix and match query strategies with classifiers provided by sklearn, Pytorch, or transformers.
- Supports GPU-based Pytorch models and integrates transformers so that you can use state-of-the-art Text Classification models for Active Learning.
- GPU is supported but not required. In case of a CPU-only use case, a lightweight installation only requires a minimal set of dependencies.
- Multiple scientifically evaluated components are pre-implemented and ready to use (Query Strategies, Initialization Strategies, and Stopping Criteria).
-
The small-text paper was awarded Best System Demonstration at EACL 2023 🎉
- Thank you for your support! Click here to see the published paper.
-
Version 1.3.0 (v1.3.0): Highlights - February 20th, 2023
- Added dropout sampling to SetFitClassification.
-
Version 1.2.0 (v1.2.0): Highlights - February 4th, 2023
- Make huggingface/setfit (SetFit) usable as a small-text classifier.
- New query strategy: BALD.
- Added two new SetFit notebooks, and also updated existing notebooks.
-
Version 1.1.1 (v1.1.1) - October 14, 2022
- Fixes model selection which could raise an error under certain circumstances (#21).
For a complete list of changes, see the change log.
Small-Text can be easily installed via pip:
pip install small-text
For a full installation include the transformers extra requirement:
pip install small-text[transformers]
It requires Python 3.7 or newer. For using the GPU, CUDA 10.1 or newer is required. More information regarding the installation can be found in the documentation.
For a quick start, see the provided examples for binary classification, pytorch multi-class classification, and transformer-based multi-class classification, or check out the notebooks.
- Tutorial: 👂 Active learning for text classification with small-text (Use small-text conveniently from the argilla UI.)
A full list of showcases can be found in the docs.
🎀 Would you like to share your use case? Regardless if it is a paper, an experiment, a practical application, a thesis, a dataset, or other, let us know and we will add you to the showcase section or even here.
Read the latest documentation here. Noteworthy pages include:
Contributions are welcome. Details can be found in CONTRIBUTING.md.
This software was created by Christopher Schröder (@chschroeder) at Leipzig University's NLP group which is a part of the Webis research network. The encompassing project was funded by the Development Bank of Saxony (SAB) under project number 100335729.
Small-Text has been introduced in detail in the EACL23 System Demonstration Paper "Small-Text: Active Learning for Text Classification in Python" which can be cited as follows:
@inproceedings{schroeder2023small-text,
title = "Small-Text: Active Learning for Text Classification in Python",
author = {Schr{\"o}der, Christopher and M{\"u}ller, Lydia and Niekler, Andreas and Potthast, Martin},
booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations",
month = may,
year = "2023",
address = "Dubrovnik, Croatia",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.eacl-demo.11",
pages = "84--95"
}