Awesome Open Source
Awesome Open Source

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

This package provides spaCy components and architectures to use transformer models via Hugging Face's transformers in spaCy. The result is convenient access to state-of-the-art transformer architectures, such as BERT, GPT-2, XLNet, etc.

This release requires spaCy v3. For the previous version of this library, see the v0.6.x branch.

Azure Pipelines PyPi GitHub Code style: black

Features

  • Use pretrained transformer models like BERT, RoBERTa and XLNet to power your spaCy pipeline.
  • Easy multi-task learning: backprop to one transformer model from several pipeline components.
  • Train using spaCy v3's powerful and extensible config system.
  • Automatic alignment of transformer output to spaCy's tokenization.
  • Easily customize what transformer data is saved in the Doc object.
  • Easily customize how long documents are processed.
  • Out-of-the-box serialization and model packaging.

🚀 Installation

Installing the package from pip will automatically install all dependencies, including PyTorch and spaCy. Make sure you install this package before you install the models. Also note that this package requires Python 3.6+, PyTorch v1.5+ and spaCy v3.0+.

pip install spacy[transformers]

For GPU installation, find your CUDA version using nvcc --version and add the version in brackets, e.g. spacy[transformers,cuda92] for CUDA9.2 or spacy[transformers,cuda100] for CUDA10.0.

If you are having trouble installing PyTorch, follow the instructions on the official website for your specific operating system and requirements, or try the following:

pip install spacy-transformers -f https://download.pytorch.org/whl/torch_stable.html

📖 Documentation

⚠️ Important note: This package has been extensively refactored to take advantage of spaCy v3.0. Previous versions that were built for spaCy v2.x worked considerably differently. Please see previous tagged versions of this README for documentation on prior versions.


Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
Python (1,115,497) 
Machine Learning (30,751) 
Pytorch (11,142) 
Nlp (8,142) 
Natural Language Processing (4,593) 
Google (3,247) 
Bert (1,127) 
Transfer Learning (1,032) 
Language Model (516) 
Spacy (500) 
Natural Language Understanding (357) 
Gpt 2 (266) 
Openai (202) 
Huggingface (165) 
Related Projects
Advertising 📦 9
All Projects
Application Programming Interfaces 📦 120
Applications 📦 181
Artificial Intelligence 📦 72
Blockchain 📦 70
Build Tools 📦 111
Cloud Computing 📦 79
Code Quality 📦 28
Collaboration 📦 30
Command Line Interface 📦 48
Community 📦 81
Companies 📦 60
Compilers 📦 60
Computer Science 📦 74
Configuration Management 📦 39
Content Management 📦 167
Control Flow 📦 197
Data Formats 📦 77
Data Processing 📦 266
Data Storage 📦 132
Economics 📦 60
Frameworks 📦 198
Games 📦 122
Graphics 📦 103
Hardware 📦 148
Integrated Development Environments 📦 47
Learning Resources 📦 147
Legal 📦 28
Libraries 📦 119
Lists Of Projects 📦 21
Machine Learning 📦 336
Mapping 📦 61
Marketing 📦 15
Mathematics 📦 55
Media 📦 228
Messaging 📦 97
Networking 📦 304
Operating Systems 📦 84
Operations 📦 120
Package Managers 📦 52
Programming Languages 📦 229
Runtime Environments 📦 96
Science 📦 42
Security 📦 375
Social Media 📦 26
Software Architecture 📦 70
Software Development 📦 68
Software Performance 📦 57
Software Quality 📦 127
Text Editors 📦 45
Text Processing 📦 131
User Interface 📦 310
User Interface Components 📦 465
Version Control 📦 29
Virtualization 📦 68
Web Browsers 📦 38
Web Servers 📦 25
Web User Interface 📦 194