Awesome Open Source
Awesome Open Source

PyPI codecov Documentation Status GitHub

small-text logo

Active Learning for Text Classifcation in Python.

Installation | Quick Start | Docs

Small-Text provides state-of-the-art Active Learning for Text Classification. This allows you to easily mix and match many classifiers and query strategies to build active learning experiments or applications.

What is Active Learning?

Active Learning allows you to efficiently label training data in a small-data scenario.


  • Provides unified interfaces for Active Learning so that you can easily use any classifier provided by sklearn.
  • (Optionally) As an optional feature, you can also use pytorch classifiers, including transformer models.
  • Multiple scientifically-proven strategies re-implemented: Query Strategies, Initialization Strategies


Small-text can be easily installed via pip:

pip install small-text

For a full installation include the transformers extra requirement:

pip install small-text[transformers]

Requires Python 3.7 or newer. For using the GPU, CUDA 10.1 or newer is required. More information regarding the installation can be found in the documentation.

Quick Start

For a quick start, see the provided examples for binary classification, pytorch multi-class classification, and transformer-based multi-class classification, or check out the notebooks.


# Notebook
1 Intro: Active Learning for Text Classification with Small-Text Open In Colab


Read the latest documentation (currently work in progress) here.



Contributions are welcome. Details can be found in


This software was created by @chschroeder at Leipzig University's NLP group which is a part of the Webis research network. The encompassing project was funded by the Development Bank of Saxony (SAB) under project number 100335729.


A preprint which introduces small-text is available here:
Small-text: Active Learning for Text Classification in Python.

    title={Small-text: Active Learning for Text Classification in Python}, 
    author={Christopher Schröder and Lydia Müller and Andreas Niekler and Martin Potthast},


MIT License

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
Python (1,142,951
Machine Learning (31,745
Deep Learning (23,727
Pytorch (11,616
Nlp (8,378
Natural Language Processing (4,747
Transformer (1,689
Text Classification (1,157
Active Learning (195
Related Projects