DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch.
DeepPavlov is designed for
Please leave us your feedback on how we can improve the DeepPavlov framework.
Models
Named Entity Recognition | Slot filling
Intent/Sentence Classification | Question Answering over Text (SQuAD)
Knowledge Base Question Answering
Sentence Similarity/Ranking | TF-IDF Ranking
Morphological tagging | Syntactic parsing
Automatic Spelling Correction | ELMo training and fine-tuning
Speech recognition and synthesis (ASR and TTS) based on NVIDIA NeMo
Entity Linking | Multitask BERT
Skills
Goal(Task)-oriented Bot | Seq2seq Goal-Oriented bot
Open Domain Questions Answering | eCommerce Bot
Frequently Asked Questions Answering | Pattern Matching
Embeddings
BERT embeddings for the Russian, Polish, Bulgarian, Czech, and informal English
ELMo embeddings for the Russian language
FastText embeddings for the Russian language
Auto ML
Tuning Models with Evolutionary Algorithm
Integrations
REST API | Socket API | Yandex Alice
Telegram | Microsoft Bot Framework
We support Linux
and Windows
platforms, Python 3.6
and Python 3.7
Python 3.5
is not supported!Windows
requires Git
(for example, git) and Visual Studio 2015/2017
with C++
build tools installed!Create and activate a virtual environment:
Linux
python -m venv env
source ./env/bin/activate
Windows
python -m venv env
.\env\Scripts\activate.bat
Install the package inside the environment:
pip install deeppavlov
There is a bunch of great pre-trained NLP models in DeepPavlov. Each model is determined by its config file.
List of models is available on
the doc page in
the deeppavlov.configs
(Python):
from deeppavlov import configs
When you're decided on the model (+ config file), there are two ways to train, evaluate and infer it:
To run supported DeepPavlov models on GPU you should have CUDA 10.0
installed on your host machine and TensorFlow with GPU support (tensorflow-gpu
)
installed in your python environment. Current supported TensorFlow version is 1.15.2.
Run
pip install tensorflow-gpu==1.15.2
before installing model's package requirements to install supported tensorflow-gpu
version.
Before making choice of an interface, install model's package requirements (CLI):
python -m deeppavlov install <config_path>
<config_path>
is path to the chosen model's config file (e.g.
deeppavlov/configs/ner/slotfill_dstc2.json
) or just name without
.json extension (e.g. slotfill_dstc2
)To get predictions from a model interactively through CLI, run
python -m deeppavlov interact <config_path> [-d]
-d
downloads required data -- pretrained model files and embeddings
(optional).You can train it in the same simple way:
python -m deeppavlov train <config_path> [-d]
Dataset will be downloaded regardless of whether there was -d
flag or not.
To train on your own data you need to modify dataset reader path in the train config doc. The data format is specified in the corresponding model doc page.
There are even more actions you can perform with configs:
python -m deeppavlov <action> <config_path> [-d]
<action>
can be
download
to download model's data (same as -d
),train
to train the model on the data specified in the config file,evaluate
to calculate metrics on the same dataset,interact
to interact via CLI,riseapi
to run a REST API server (see
doc),telegram
to run as a Telegram bot (see
doc),msbot
to run a Miscrosoft Bot Framework server (see
doc),predict
to get prediction for samples from stdin or from
<file_path> if -f <file_path>
is specified.<config_path>
specifies path (or name) of model's config file-d
downloads required dataTo get predictions from a model interactively through Python, run
from deeppavlov import build_model
model = build_model(<config_path>, download=True)
# get predictions for 'input_text1', 'input_text2'
model(['input_text1', 'input_text2'])
download=True
downloads required data from web -- pretrained model
files and embeddings (optional),<config_path>
is path to the chosen model's config file (e.g.
"deeppavlov/configs/ner/ner_ontonotes_bert_mult.json"
) or
deeppavlov.configs
attribute (e.g.
deeppavlov.configs.ner.ner_ontonotes_bert_mult
without quotation marks).You can train it in the same simple way:
from deeppavlov import train_model
model = train_model(<config_path>, download=True)
download=True
downloads pretrained model, therefore the pretrained
model will be, first, loaded and then train (optional).Dataset will be downloaded regardless of whether there was -d
flag or
not.
To train on your own data you need to modify dataset reader path in the train config doc. The data format is specified in the corresponding model doc page.
You can also calculate metrics on the dataset specified in your config file:
from deeppavlov import evaluate_model
model = evaluate_model(<config_path>, download=True)
There are also available integrations with various messengers, see Telegram Bot doc page and others in the Integrations section for more info.
Breaking changes in version 0.7.0
agent_name
parameter was renamed to logger_name
,
the default value was changed/start
and /help
Telegram messages were moved from models_info.json
to server_config.json
Breaking changes in version 0.6.0
/model
chainer.in
configuration parameter instead of pre-set names
from a settings file
/apidocs
to /docs
"max_proba": true
in
a proba2labels
component for classification,
it will return single label for every batch element instead of a list. One can set "top_n": 1
to get batches of single item lists as beforeBreaking changes in version 0.5.0
tensorflow
require CUDA 10.0
to run on GPU instead of CUDA 9.0
Breaking changes in version 0.4.0!
MODELS_PATH
to MODEL_PATH
.Breaking changes in version 0.3.0!
fit_on_batch
in configuration files was removed and replaced with adaptive usage of the fit_on
parameter.Breaking changes in version 0.2.0!
utils
module was moved from repository root in to deeppavlov
modulems_bot_framework_utils
,server_utils
, telegram utils
modules was renamed to ms_bot_framework
, server
and telegram
correspondinglyexact_match
to squad_v2_em
and squad_f1
to squad_v2_f1
Breaking changes in version 0.1.0!
As of version 0.1.0
all models, embeddings and other downloaded data for provided configurations are
by default downloaded to the .deeppavlov
directory in current user's home directory.
This can be changed on per-model basis by modifying
a ROOT_PATH
variable
or related fields one by one in model's configuration file.
In configuration files, for all features/models, dataset readers and iterators "name"
and "class"
fields are combined
into the "class_name"
field.
deeppavlov.core.commands.infer.build_model_from_config()
was renamed to build_model
and can be imported from the
deeppavlov
module directly.
The way arguments are passed to metrics functions during training and evaluation was changed and documented.
DeepPavlov is Apache 2.0 - licensed.
DeepPavlov is built and maintained by Neural Networks and Deep Learning Lab at MIPT.