Sentence Transformers

Multilingual Sentence & Image Embeddings with BERT
Alternatives To Sentence Transformers
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Flair12,582245221 hours ago27May 20, 202277otherPython
A very simple framework for state-of-the-art Natural Language Processing (NLP)
Sentence Transformers9,7111274 days ago42June 26, 2022924apache-2.0Python
Multilingual Sentence & Image Embeddings with BERT
Pytorch Metric Learning5,06812a month ago193June 29, 202241mitPython
The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.
Pytorch Sentiment Analysis2,905
2 years ago16mitJupyter Notebook
Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
Awesome Network Embedding2,218
2 years ago3
A curated list of network embedding techniques.
Lightly2,196
a day ago43mitPython
A python library for self-supervised learning on images.
Pytorch Nlp1,929982 years ago19November 04, 201916bsd-3-clausePython
Basic Utilities for PyTorch Natural Language Processing (NLP)
Siamese Triplet1,767
4 years ago4bsd-3-clausePython
Siamese and triplet networks with online pair/triplet mining in PyTorch
Poincare Embeddings1,456
a year ago23otherPython
PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"
Capsgnn1,180
4 days ago3gpl-3.0Python
A PyTorch implementation of "Capsule Graph Neural Network" (ICLR 2019).
Alternatives To Sentence Transformers
Select To Compare


Alternative Project Comparisons
Readme

GitHub - License PyPI - Python Version PyPI - Package Version Conda - Platform Conda (channel only) Docs - GitHub.io

Sentence Transformers: Multilingual Sentence, Paragraph, and Image Embeddings using BERT & Co.

This framework provides an easy method to compute dense vector representations for sentences, paragraphs, and images. The models are based on transformer networks like BERT / RoBERTa / XLM-RoBERTa etc. and achieve state-of-the-art performance in various task. Text is embedding in vector space such that similar text is close and can efficiently be found using cosine similarity.

We provide an increasing number of state-of-the-art pretrained models for more than 100 languages, fine-tuned for various use-cases.

Further, this framework allows an easy fine-tuning of custom embeddings models, to achieve maximal performance on your specific task.

For the full documentation, see www.SBERT.net.

The following publications are integrated in this framework:

Installation

We recommend Python 3.6 or higher, PyTorch 1.6.0 or higher and transformers v4.6.0 or higher. The code does not work with Python 2.7.

Install with pip

Install the sentence-transformers with pip:

pip install -U sentence-transformers

Install with conda

You can install the sentence-transformers with conda:

conda install -c conda-forge sentence-transformers

Install from sources

Alternatively, you can also clone the latest version from the repository and install it directly from the source code:

pip install -e .

PyTorch with CUDA

If you want to use a GPU / CUDA, you must install PyTorch with the matching CUDA Version. Follow PyTorch - Get Started for further details how to install PyTorch.

Getting Started

See Quickstart in our documenation.

This example shows you how to use an already trained Sentence Transformer model to embed sentences for another task.

First download a pretrained model.

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')

Then provide some sentences to the model.

sentences = ['This framework generates embeddings for each input sentence',
    'Sentences are passed as a list of string.', 
    'The quick brown fox jumps over the lazy dog.']
sentence_embeddings = model.encode(sentences)

And that's it already. We now have a list of numpy arrays with the embeddings.

for sentence, embedding in zip(sentences, sentence_embeddings):
    print("Sentence:", sentence)
    print("Embedding:", embedding)
    print("")

Pre-Trained Models

We provide a large list of Pretrained Models for more than 100 languages. Some models are general purpose models, while others produce embeddings for specific use cases. Pre-trained models can be loaded by just passing the model name: SentenceTransformer('model_name').

» Full list of pretrained models

Training

This framework allows you to fine-tune your own sentence embedding methods, so that you get task-specific sentence embeddings. You have various options to choose from in order to get perfect sentence embeddings for your specific task.

See Training Overview for an introduction how to train your own embedding models. We provide various examples how to train models on various datasets.

Some highlights are:

  • Support of various transformer networks including BERT, RoBERTa, XLM-R, DistilBERT, Electra, BART, ...
  • Multi-Lingual and multi-task learning
  • Evaluation during training to find optimal model
  • 10+ loss-functions allowing to tune models specifically for semantic search, paraphrase mining, semantic similarity comparison, clustering, triplet loss, contrastive loss.

Performance

Our models are evaluated extensively on 15+ datasets including challening domains like Tweets, Reddit, emails. They achieve by far the best performance from all available sentence embedding methods. Further, we provide several smaller models that are optimized for speed.

» Full list of pretrained models

Application Examples

You can use this framework for:

and many more use-cases.

For all examples, see examples/applications.

Citing & Authors

If you find this repository helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

If you use one of the multilingual models, feel free to cite our publication Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation:

@inproceedings{reimers-2020-multilingual-sentence-bert,
    title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2004.09813",
}

Please have a look at Publications for our different publications that are integrated into SentenceTransformers.

Contact person: Nils Reimers, [email protected]

https://www.ukp.tu-darmstadt.de/

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Popular Embeddings Projects
Popular Pytorch Projects
Popular Machine Learning Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
Pytorch
Embeddings