Models, data loaders and abstractions for language processing, powered by PyTorch
Alternatives To Text
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Transformers112,426641,86916 hours ago114July 18, 2023832apache-2.0Python
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Stable Diffusion Webui103,491
a day ago2January 17, 20221,524agpl-3.0Python
Stable Diffusion web UI
Pytorch71,0773,3416,72816 hours ago37May 08, 202312,759otherPython
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Keras59,41557816 hours ago80June 27, 202395apache-2.0Python
Deep Learning for humans
Real Time Voice Cloning47,152
2 days ago168otherPython
Clone a voice in 5 seconds to generate arbitrary speech in real-time
2 days ago8September 21, 2021226agpl-3.0Python
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
Annotated_deep_learning_paper_implementations36,22316 days ago78September 24, 202227mitJupyter Notebook
🧑‍🏫 60 Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
Made With Ml34,182
5 days ago5May 15, 20192mitJupyter Notebook
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Gfpgan32,185914 days ago11September 20, 2022271otherPython
GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
22 days ago2February 28, 2022446otherPython
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Alternatives To Text
Select To Compare

Alternative Project Comparisons
docs/source/_static/img/torchtext_logo.png https://circleci.com/gh/pytorch/text.svg?style=svg https://img.shields.io/badge/dynamic/json.svg?label=docs&url=https%3A%2F%2Fpypi.org%2Fpypi%2Ftorchtext%2Fjson&query=%24.info.version&colorB=brightgreen&prefix=v


This repository consists of:


We recommend Anaconda as a Python package management system. Please refer to pytorch.org for the details of PyTorch installation. The following are the corresponding torchtext versions and supported Python versions.

Version Compatibility
PyTorch version torchtext version Supported Python version
nightly build main >=3.8, <=3.11
1.14.0 0.15.0 >=3.8, <=3.11
1.13.0 0.14.0 >=3.7, <=3.10
1.12.0 0.13.0 >=3.7, <=3.10
1.11.0 0.12.0 >=3.6, <=3.9
1.10.0 0.11.0 >=3.6, <=3.9
1.9.1 0.10.1 >=3.6, <=3.9
1.9 0.10 >=3.6, <=3.9
1.8.1 0.9.1 >=3.6, <=3.9
1.8 0.9 >=3.6, <=3.9
1.7.1 0.8.1 >=3.6, <=3.9
1.7 0.8 >=3.6, <=3.8
1.6 0.7 >=3.6, <=3.8
1.5 0.6 >=3.5, <=3.8
1.4 0.5 2.7, >=3.5, <=3.8
0.4 and below 0.2.3 2.7, >=3.5, <=3.8

Using conda:

conda install -c pytorch torchtext

Using pip:

pip install torchtext

Optional requirements

If you want to use English tokenizer from SpaCy, you need to install SpaCy and download its English model:

pip install spacy
python -m spacy download en_core_web_sm

Alternatively, you might want to use the Moses tokenizer port in SacreMoses (split from NLTK). You have to install SacreMoses:

pip install sacremoses

For torchtext 0.5 and below, sentencepiece:

conda install -c powerai sentencepiece

Building from source

To build torchtext from source, you need git, CMake and C++11 compiler such as g++.:

git clone https://github.com/pytorch/text torchtext
cd torchtext
git submodule update --init --recursive

# Linux
python setup.py clean install

CC=clang CXX=clang++ python setup.py clean install

# or ``python setup.py develop`` if you are making modifications.


When building from source, make sure that you have the same C++ compiler as the one used to build PyTorch. A simple way is to build PyTorch from source and use the same environment to build torchtext. If you are using the nightly build of PyTorch, checkout the environment it was built with conda (here) and pip (here).

Additionally, datasets in torchtext are implemented using the torchdata library. Please take a look at the installation instructions to download the latest nightlies or install from source.


Find the documentation here.


The datasets module currently contains:

  • Language modeling: WikiText2, WikiText103, PennTreebank, EnWik9
  • Machine translation: IWSLT2016, IWSLT2017, Multi30k
  • Sequence tagging (e.g. POS/NER): UDPOS, CoNLL2000Chunking
  • Question answering: SQuAD1, SQuAD2
  • Text classification: SST2, AG_NEWS, SogouNews, DBpedia, YelpReviewPolarity, YelpReviewFull, YahooAnswers, AmazonReviewPolarity, AmazonReviewFull, IMDB
  • Model pre-training: CC-100


The library currently consist of following pre-trained models:


The transforms module currently support following scriptable tokenizers:


To get started with torchtext, users may refer to the following tutorial available on PyTorch website.

Disclaimer on Datasets

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset's license.

If you're a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

Popular Pytorch Projects
Popular Deep Learning Projects
Popular Machine Learning Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Deep Learning
Natural Language Processing