Awesome Open Source
Awesome Open Source
Combined Topics
language-model
x
Advertising
📦 10
All Projects
Application Programming Interfaces
📦 124
Applications
📦 192
Artificial Intelligence
📦 78
Blockchain
📦 73
Build Tools
📦 113
Cloud Computing
📦 80
Code Quality
📦 28
Collaboration
📦 32
Command Line Interface
📦 49
Community
📦 83
Companies
📦 60
Compilers
📦 63
Computer Science
📦 80
Configuration Management
📦 42
Content Management
📦 175
Control Flow
📦 213
Data Formats
📦 78
Data Processing
📦 276
Data Storage
📦 135
Economics
📦 64
Frameworks
📦 215
Games
📦 129
Graphics
📦 110
Hardware
📦 152
Integrated Development Environments
📦 49
Learning Resources
📦 166
Legal
📦 29
Libraries
📦 129
Lists Of Projects
📦 22
Machine Learning
📦 347
Mapping
📦 64
Marketing
📦 15
Mathematics
📦 55
Media
📦 239
Messaging
📦 98
Networking
📦 315
Operating Systems
📦 89
Operations
📦 121
Package Managers
📦 55
Programming Languages
📦 245
Runtime Environments
📦 100
Science
📦 42
Security
📦 396
Social Media
📦 27
Software Architecture
📦 72
Software Development
📦 72
Software Performance
📦 58
Software Quality
📦 133
Text Editors
📦 49
Text Processing
📦 136
User Interface
📦 330
User Interface Components
📦 514
Version Control
📦 30
Virtualization
📦 71
Web Browsers
📦 42
Web Servers
📦 26
Web User Interface
📦 210
The Top 104 Language Model Open Source Projects
Categories
>
Machine Learning
>
Language Model
Transformers
⭐
44,616
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
Nlp_chinese_corpus
⭐
6,060
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Tokenizers
⭐
4,448
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Gpt Neo
⭐
4,355
An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
Bert Pytorch
⭐
4,172
Google AI 2018 BERT pytorch implementation
Lingvo
⭐
2,216
Lingvo
Keras Bert
⭐
2,132
Implementation of BERT that could load official pre-trained models for feature extraction and prediction
Lazynlp
⭐
1,920
Library to scrape and clean web pages to create massive datasets.
Awesome Sentence Embedding
⭐
1,827
A curated list of pretrained sentence and word embedding models
Awesome Speech Recognition Speech Synthesis Papers
⭐
1,816
Speech synthesis, voice conversion, self-supervised learning, music generation,Automatic Speech Recognition, Speaker Verification, Speech Synthesis, Language Modeling
Awd Lstm Lm
⭐
1,810
LSTM and QRNN Language Model Toolkit for PyTorch
Clue
⭐
1,807
中文语言理解基准测评 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Haystack
⭐
1,654
🔍 End-to-end Python framework for building natural language search interfaces to data. Leverages Transformers and the State-of-the-Art of NLP. Supports DPR, Elasticsearch, Hugging Face’s Hub, and much more!
Openseq2seq
⭐
1,388
Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
Pytorch Openai Transformer Lm
⭐
1,275
🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI
Pytorch Cpp
⭐
1,038
C++ Implementation of PyTorch Tutorials for Everyone
Nlp Library
⭐
1,029
curated collection of papers for the nlp practitioner 📖👩🔬
Bert_language_understanding
⭐
941
Pre-training of Deep Bidirectional Transformers for Language Understanding: pre-train TextCNN
Spacy Transformers
⭐
933
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy
Chinese Electra
⭐
862
Pre-trained Chinese ELECTRA(中文ELECTRA预训练模型)
Spago
⭐
859
Self-contained Machine Learning and Natural Language Processing library in Go
Lm Lstm Crf
⭐
783
Empower Sequence Labeling with Task-Aware Language Model
Pykaldi
⭐
770
A Python wrapper for Kaldi
Lightnlp
⭐
742
基于Pytorch和torchtext的自然语言处理深度学习框架。
Keras Language Modeling
⭐
666
📖 Some language modeling tools for Keras
Dl Nlp Readings
⭐
659
My Reading Lists of Deep Learning and Natural Language Processing
Gpt Neox
⭐
638
An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hundreds of billions of parameters or larger.
Kobert
⭐
612
Korean BERT pre-trained cased (KoBERT)
Deberta
⭐
600
The implementation of DeBERTa
Awesome Bert Nlp
⭐
572
A curated list of NLP resources focused on BERT, attention mechanism, Transformer networks, and transfer learning.
Sentiment_analysis_fine_grain
⭐
551
Multi-label Classification with BERT; Fine Grained Sentiment Analysis from AI challenger
Albert_pytorch
⭐
551
A Lite Bert For Self-Supervised Learning Language Representations
Ctcdecoder
⭐
541
Connectionist Temporal Classification (CTC) decoding algorithms: best path, prefix search, beam search and token passing. Implemented in Python.
Nlp Paper
⭐
492
NLP Paper
Trankit
⭐
426
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Neural_sp
⭐
418
End-to-end ASR/LM implementation with PyTorch
Ctcwordbeamsearch
⭐
410
Connectionist Temporal Classification (CTC) decoder with dictionary and language model for TensorFlow.
Zamia Speech
⭐
388
Open tools and data for cloudless automatic speech recognition
Tf_chatbot_seq2seq_antilm
⭐
368
Seq2seq chatbot with attention and anti-language model to suppress generic response, option for further improve by deep reinforcement learning.
Kogpt2
⭐
368
Korean GPT-2 pretrained cased (KoGPT2)
Azureml Bert
⭐
343
End-to-End recipes for pre-training and fine-tuning BERT using Azure Machine Learning Service
Xlnet Pytorch
⭐
304
An implementation of Google Brain's 2019 XLNet in PyTorch
Bertweet
⭐
294
BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)
Transfer Nlp
⭐
287
NLP library designed for reproducible experimentation management
Bluebert
⭐
287
BlueBERT, pre-trained on PubMed abstracts and clinical notes (MIMIC-III).
A Pytorch Tutorial To Sequence Labeling
⭐
266
Empower Sequence Labeling with Task-Aware Neural Language Model | a PyTorch Tutorial to Sequence Labeling
Zeroth
⭐
255
Kaldi-based Korean ASR (한국어 음성인식) open-source project
Mead Baseline
⭐
238
Deep-Learning Model Exploration and Development for NLP
Relational Rnn Pytorch
⭐
237
An implementation of DeepMind's Relational Recurrent Neural Networks in PyTorch.
Attention Mechanisms
⭐
212
Implementations for a family of attention mechanisms, suitable for all kinds of natural language processing tasks and compatible with TensorFlow 2.0 and Keras.
Pytorch Nce
⭐
208
The Noise Contrastive Estimation for softmax output written in Pytorch
Xlnet_zh
⭐
207
中文预训练XLNet模型: Pre-Trained Chinese XLNet_Large
Protein Sequence Embedding Iclr2019
⭐
199
Source code for "Learning protein sequence embeddings using information from structure" - ICLR 2019
Gpt Scrolls
⭐
198
A collaborative collection of open-source safe GPT-3 prompts that work well
Automatic Speech Recognition
⭐
196
🎧 Automatic Speech Recognition: DeepSpeech & Seq2Seq (TensorFlow)
Nlp_learning
⭐
194
结合python一起学习自然语言处理 (nlp): 语言模型、HMM、PCFG、Word2vec、完形填空式阅读理解任务、朴素贝叶斯分类器、TFIDF、PCA、SVD
Char Rnn Chinese
⭐
192
Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch. Based on code of https://github.com/karpathy/char-rnn. Support Chinese and other things.
Bert Sklearn
⭐
190
a sklearn wrapper for Google's BERT model
Bert As Language Model
⭐
189
bert as language model, fork from https://github.com/google-research/bert
Optimus
⭐
188
Optimus: the first large-scale pre-trained VAE language model
Macbert
⭐
182
Revisiting Pre-trained Models for Chinese Natural Language Processing (Findings of EMNLP)
Indic Bert
⭐
176
BERT-based Multilingual Model for Indian Languages
Lotclass
⭐
172
[EMNLP 2020] Text Classification Using Label Names Only: A Language Model Self-Training Approach
Xlnet Gen
⭐
164
XLNet for generating language.
Keras Xlnet
⭐
160
Implementation of XLNet that can load pretrained checkpoints
Electra_pytorch
⭐
157
Pretrain and finetune ELECTRA with fastai and huggingface. (Results of the paper replicated !)
F Lm
⭐
156
Language Modeling
Transformer Lm
⭐
155
Transformer language model (GPT-2) with sentencepiece tokenizer
Speecht
⭐
152
An opensource speech-to-text software written in tensorflow
Backprop
⭐
151
Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.
Tupe
⭐
150
Transformer with Untied Positional Encoding (TUPE). Code of paper "Rethinking Positional Encoding in Language Pre-training". Improve existing models like BERT.
Ld Net
⭐
148
Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling
Chars2vec
⭐
132
Character-based word embeddings model based on RNN for handling real world texts
Electra
⭐
132
中文 预训练 ELECTRA 模型: 基于对抗学习 pretrain Chinese Model
Kogpt2 Finetuning
⭐
127
🔥 Korean GPT-2, KoGPT2 FineTuning cased. 한국어 가사 데이터 학습 🔥
Dynamic Memory Networks Plus Pytorch
⭐
123
Implementation of Dynamic memory networks plus in Pytorch
Robbert
⭐
122
A Dutch RoBERTa-based language model
Lm Scorer
⭐
117
📃Language Model based sentences scoring library
Keras Gpt 2
⭐
114
Load GPT-2 checkpoint and generate texts
Lingo
⭐
113
package lingo provides the data structures and algorithms required for natural language processing
Getlang
⭐
110
Natural language detection package in pure Go
Easy Bert
⭐
109
A Dead Simple BERT API for Python and Java (https://github.com/google-research/bert)
Pytorch_gbw_lm
⭐
101
PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset
Pyclue
⭐
93
Python toolkit for Chinese Language Understanding(CLUE) Evaluation benchmark
Bio_embeddings
⭐
92
Get protein embeddings from protein sequences
Nezha_chinese_pytorch
⭐
92
NEZHA: Neural Contextualized Representation for Chinese Language Understanding
Bit Rnn
⭐
88
Quantize weights and activations in Recurrent Neural Networks.
Text Gan Tensorflow
⭐
87
TensorFlow GAN implementation using Gumbel Softmax
Tongrams
⭐
87
A C++ library providing fast language model queries in compressed space.
Greek Bert
⭐
86
A Greek edition of BERT pre-trained language model
Ckip Transformers
⭐
85
CKIP Transformers
Full_stack_transformer
⭐
71
Pytorch library for end-to-end transformer models training, inference and serving
Cross Domain_ner
⭐
68
Cross-domain NER using cross-domain language modeling, code for ACL 2019 paper
Indonesian Language Models
⭐
66
Indonesian Language Models and its Usage
Gpt2
⭐
64
PyTorch Implementation of OpenAI GPT-2
Tner
⭐
62
Language model finetuning on NER with an easy interface, and cross-domain evaluation. We released NER models finetuned on various domain via huggingface model hub.
Phonlp
⭐
61
PhoNLP: A BERT-based multi-task learning model for part-of-speech tagging, named entity recognition and dependency parsing (NAACL 2021)
Char_rnn_lm_zh
⭐
57
language model in Chinese,基于Pytorch官方文档实现
Vietnamese Electra
⭐
56
Electra pre-trained model using Vietnamese corpus
Suggest
⭐
53
Top-k Approximate String Matching.
1-100 of 104 projects
Next >
Advertising
📦 10
All Projects
Application Programming Interfaces
📦 124
Applications
📦 192
Artificial Intelligence
📦 78
Blockchain
📦 73
Build Tools
📦 113
Cloud Computing
📦 80
Code Quality
📦 28
Collaboration
📦 32
Command Line Interface
📦 49
Community
📦 83
Companies
📦 60
Compilers
📦 63
Computer Science
📦 80
Configuration Management
📦 42
Content Management
📦 175
Control Flow
📦 213
Data Formats
📦 78
Data Processing
📦 276
Data Storage
📦 135
Economics
📦 64
Frameworks
📦 215
Games
📦 129
Graphics
📦 110
Hardware
📦 152
Integrated Development Environments
📦 49
Learning Resources
📦 166
Legal
📦 29
Libraries
📦 129
Lists Of Projects
📦 22
Machine Learning
📦 347
Mapping
📦 64
Marketing
📦 15
Mathematics
📦 55
Media
📦 239
Messaging
📦 98
Networking
📦 315
Operating Systems
📦 89
Operations
📦 121
Package Managers
📦 55
Programming Languages
📦 245
Runtime Environments
📦 100
Science
📦 42
Security
📦 396
Social Media
📦 27
Software Architecture
📦 72
Software Development
📦 72
Software Performance
📦 58
Software Quality
📦 133
Text Editors
📦 49
Text Processing
📦 136
User Interface
📦 330
User Interface Components
📦 514
Version Control
📦 30
Virtualization
📦 71
Web Browsers
📦 42
Web Servers
📦 26
Web User Interface
📦 210