Texar

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow
Alternatives To Texar
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Textgenrnn4,4261422 years ago14February 02, 2020122otherPython
Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.
Gpt 2 Simple3,06734a year ago18October 18, 2021170otherPython
Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts
Texar2,008
23 years ago5November 19, 201932apache-2.0Python
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow
Gpt2 Ml1,674
6 months ago22apache-2.0Python
GPT2 for Multiple Languages, including pretrained models. GPT2 多语言支持, 15亿参数中文预训练模型
Delta1,556
2 months ago3March 27, 20202apache-2.0Python
DELTA is a deep learning based natural language and speech processing platform.
Gpt2client341
2 years ago32November 13, 20197mitPython
✍🏻 gpt2-client: Easy-to-use TensorFlow Wrapper for GPT-2 117M, 345M, 774M, and 1.5B Transformer Models 🤖 📝
Attention Mechanisms294
2 years ago2mitPython
Implementations for a family of attention mechanisms, suitable for all kinds of natural language processing tasks and compatible with TensorFlow 2.0 and Keras.
Tensorflow_novelist227
7 years ago3Python
模仿莎士比亚创作戏剧!屌炸天的是还能创作金庸武侠小说!快star,保持更新!!
Gpt 2 Tensorflow2.0218
a year ago19mitPython
OpenAI GPT2 pre-training and sequence prediction implementation in Tensorflow 2.0
Char Rnn Tf147
7 years ago9Python
Implement character-level language models for text generation based-on LSTM, in Python/TensorFlow
Alternatives To Texar
Select To Compare


Alternative Project Comparisons
Readme



pypi Build Status codecov Documentation Status License

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides a library of easy-to-use ML modules and functionalities for composing whatever models and algorithms. The tool is designed for both researchers and practitioners for fast prototyping and experimentation.

Texar was originally developed and is actively contributed by Petuum and CMU in collaboration with other institutes. A mirror of this repository is maintained by Petuum Open Source.

Key Features

  • Two Versions, (Mostly) Same Interfaces. Texar-TensorFlow (this repo) and Texar-PyTorch have mostly the same interfaces. Both further combine the best design of TF and PyTorch:
    • Interfaces and variable sharing in PyTorch convention
    • Excellent factorization and rich functionalities in TF convention.
  • Rich Pre-trained Models, Rich Usage with Uniform Interfaces. BERT, GPT2, XLNet, etc, for encoding, classification, generation, and composing complex models with other Texar components!
  • Fully Customizable at multiple abstraction level -- both novice-friendly and expert-friendly.
    • Free to plug in whatever external modules, since Texar is fully compatible with the native TF/PyTorch APIs.
  • Versatile to support broad tasks, models, algorithms, data processing, evaluation, etc.
    • encoder(s) to decoder(s), sequential- and self-attentions, memory, hierarchical models, classifiers...
    • maximum likelihood learning, reinforcement learning, adversarial learning, probabilistic modeling, ...
  • Modularized for maximal re-use and clean APIs, based on principled decomposition of Learning-Inference-Model Architecture.
  • Distributed model training with multiple GPUs.
  • Clean, detailed documentation and rich examples.


Library API Example

Builds an encoder-decoder model, with maximum likelihood learning:

import texar.tf as tx

# Data 
data = tx.data.PairedTextData(hparams=hparams_data) # a dict of hyperparameters 
iterator = tx.data.DataIterator(data)
batch = iterator.get_next()                         # get a data mini-batch

# Model architecture
embedder = tx.modules.WordEmbedder(data.target_vocab.size, hparams=hparams_emb)
encoder = tx.modules.TransformerEncoder(hparams=hparams_enc)
outputs_enc = encoder(inputs=embedder(batch['source_text_ids']),  # call as a function
                      sequence_length=batch['source_length'])
                      
decoder = tx.modules.TransformerDecoder(
    output_layer=tf.transpose(embedder.embedding) # tie input embedding w/ output layer
    hparams=hparams_decoder)
outputs, _, _ = decoder(memory=output_enc, 
                        memory_sequence_length=batch['source_length'],
                        inputs=embedder(batch['target_text_ids']),
                        sequence_length=batch['target_length']-1,
                        decoding_strategy='greedy_train')    # teacher-forcing decoding
                        
# Loss for maximum likelihood learning
loss = tx.losses.sequence_sparse_softmax_cross_entropy(
    labels=batch['target_text_ids'][:, 1:],
    logits=outputs.logits,
    sequence_length=batch['target_length']-1)  # automatic sequence masks

# Beam search decoding
outputs_bs, _, _ = tx.modules.beam_search_decode(
    decoder,
    embedding=embedder,
    start_tokens=[data.target_vocab.bos_token_id]*num_samples,
    end_token=data.target_vocab.eos_token_id)

The same model, but with adversarial learning:

helper = tx.modules.GumbelSoftmaxTraingHelper( # Gumbel-softmax decoding
    start_tokens=[BOS]*batch_size, end_token=EOS, embedding=embedder)
outputs, _ = decoder(helper=helper)            # automatic re-use of the decoder variables

discriminator = tx.modules.BertClassifier(hparams=hparams_bert)        # pre-trained model

G_loss, D_loss = tx.losses.binary_adversarial_losses(
    real_data=data['target_text_ids'][:, 1:],
    fake_data=outputs.sample_id,
    discriminator_fn=discriminator)

The same model, but with RL policy gradient learning:

agent = tx.agents.SeqPGAgent(samples=outputs.sample_id,
                             logits=outputs.logits,
                             sequence_length=batch['target_length']-1,
                             hparams=config_model.agent)

Many more examples are available here

Installation

(Note: Texar>0.2.3 requires Python 3.6 or 3.7. To use with older Python versions, please use Texar<=0.2.3)

Texar requires:

After tensorflow and tensorflow_probability are installed, install Texar from PyPI:

pip install texar

To use cutting-edge features or develop locally, install from source:

git clone https://github.com/asyml/texar.git
cd texar
pip install .

Getting Started

Reference

If you use Texar, please cite the tech report with the following BibTex entry:

Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation
Zhiting Hu, Haoran Shi, Bowen Tan, Wentao Wang, Zichao Yang, Tiancheng Zhao, Junxian He, Lianhui Qin, Di Wang, Xuezhe Ma, Zhengzhong Liu, Xiaodan Liang, Wanrong Zhu, Devendra Sachan and Eric Xing
ACL 2019

@inproceedings{hu2019texar,
  title={Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation},
  author={Hu, Zhiting and Shi, Haoran and Tan, Bowen and Wang, Wentao and Yang, Zichao and Zhao, Tiancheng and He, Junxian and Qin, Lianhui and Wang, Di and others},
  booktitle={ACL 2019, System Demonstrations},
  year={2019}
}

License

Apache License 2.0

Companies and Universities Supporting Texar

                  

Popular Tensorflow Projects
Popular Text Generation Projects
Popular Machine Learning Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
Machine Learning
Deep Learning
Pytorch
Tensorflow
Natural Language Processing
Machine Translation
Text Generation
Data Processing