Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python corpus
corpus
x
python
x
1,068 search results found
Nltk
⭐
12,699
NLTK Source
Asrt_speechrecognition
⭐
7,253
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
Bert Pytorch
⭐
5,605
Google AI 2018 BERT pytorch implementation
Tensorflow Wavenet
⭐
5,362
A TensorFlow implementation of DeepMind's WaveNet paper
Pycorrector
⭐
4,928
pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,LLaMA等模型应用在纠错场景,
Speech To Text Wavenet
⭐
3,586
Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow
Chinese_chatbot_corpus
⭐
3,550
中文公开聊天语料库
Clue
⭐
3,345
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Markovify
⭐
3,168
A simple, extensible Markov chain generator.
Deepqa
⭐
2,878
My tensorflow implementation of "A neural conversational model", a Deep learning based chatbot
Uer Py
⭐
2,802
Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
Cluedatasetsearch
⭐
2,778
搜索所有中文NLP数据集,附常用英文NLP数据集
Trafilatura
⭐
2,447
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Weibo_terminater
⭐
2,265
Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
Awesome Chatbot
⭐
1,977
Awesome Chatbot Projects,Corpus,Papers,Tutorials.Chinese Chatbot =>:
Gpt2 Ml
⭐
1,674
GPT2 for Multiple Languages, including pretrained models. GPT2 多语言支持, 15亿参数中文预训练模型
Tensorflow 1.4 Billion Password Analysis
⭐
1,657
Deep Learning model to analyze a large corpus of clear text passwords.
Yake
⭐
1,522
Single-document unsupervised keyword extraction
Dialog_corpus
⭐
1,487
用于训练中英文对话系统的语料库 Datasets for Training Chatbot System
Rasa_nlu_chi
⭐
1,466
Turn Chinese natural language into structured data 中文自然语言理解
Entity Recognition Datasets
⭐
1,386
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
Rc Data
⭐
1,221
Question answering dataset featured in "Teaching Machines to Read and Comprehend
Chatterbot Corpus
⭐
1,219
A multilingual dialog corpus
Glove Python
⭐
1,171
Toy Python implementation of http://www-nlp.stanford.edu/projects/glove/
Cdial Gpt
⭐
944
A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models
Seq2seq Chatbot
⭐
826
Chatbot in 200 lines of code using TensorLayer
Memn2n Tensorflow
⭐
820
"End-To-End Memory Networks" in Tensorflow
Cltk
⭐
810
The Classical Language Toolkit
Bookcorpus
⭐
698
Crawl BookCorpus
Wordless
⭐
649
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
Ngram2vec
⭐
638
Four word embedding models implemented in Python. Supporting arbitrary context features
S2orc
⭐
634
S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447
Ekphrasis
⭐
583
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Cpu_rec
⭐
578
Recognize cpu instructions in an arbitrary binary file
Magpie
⭐
574
Deep neural network framework for multi-label text classification
Bytenet
⭐
570
A tensorflow implementation of French-to-English machine translation using DeepMind's ByteNet .
Text_renderer
⭐
543
Bertweet
⭐
542
BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)
Exbert
⭐
541
A Visual Analysis Tool to Explore Learned Representations in Transformers Models
Multiturnresponseselection
⭐
534
This repo contains our ACL 2017 paper data and source code
Ner Lstm
⭐
528
Named Entity Recognition using multilayered bidirectional LSTM
Cblue
⭐
515
中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Efaqa Corpus Zh
⭐
505
❤️Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库
Korpora
⭐
500
Korean corpus repository
Awesome Korean Nlp
⭐
495
A curated list of resources for NLP (Natural Language Processing) for Korean
Gensim Data
⭐
492
Data repository for pretrained NLP models and NLP corpora.
Ba Dls Deepspeech
⭐
457
Markov
⭐
441
Markov chain text generator, as used for KingJamesProgramming
Tabert
⭐
436
This repository contains source code for the TaBERT model, a pre-trained language model for learning joint representations of natural language utterances and (semi-)structured tables for semantic parsing. TaBERT is pre-trained on a massive corpus of 26M Web tables and their associated natural language context, and could be used as a drop-in replacement of a semantic parsers original encoder to compute representations for utterances and table schemas (columns).
Chinesewordsegmentation
⭐
427
Chinese word segmentation algorithm without corpus(无需语料库的中文分词)
Undreamt
⭐
421
Unsupervised Neural Machine Translation
Fact Extractor
⭐
413
Fact Extraction from Wikipedia Text
Paws
⭐
403
This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, and word order information for the problem of paraphrase identification.
Corpus
⭐
396
自然语言处理,知识图谱相关语料。按照Task细分,欢迎PR。
Kdconv
⭐
378
KdConv: A Chinese Multi-domain Dialogue Dataset Towards Multi-turn Knowledge-driven Conversation
Chinese Nlp Corpus
⭐
378
Collections of Chinese NLP corpus
Afl Utils
⭐
377
Utilities for automated crash sample processing/analysis, easy afl-fuzz job management and corpus optimization
Atap
⭐
367
Code for Applied Text Analysis with Python
Turkish Bert
⭐
364
Turkish BERT/DistilBERT, ELECTRA and ConvBERT models
Chinesenlpcorpus
⭐
362
An collection of Chinese nlp corpus including basic Chinese syntatic wordset, semantic wordset, historic corpus and evaluate corpus. 中文自然语言处理的语料集合,包括语义词、领域共时、历时语料库、评测语料库等。
Pykospacing
⭐
348
Automatic Korean word spacing with Python
Comet
⭐
346
A Neural Framework for MT Evaluation
Deepcut
⭐
319
A Thai word tokenization library using Deep Neural Network
Embedding
⭐
309
한국어 임베딩 (Sentence Embeddings Using Korean Corpora)
Pytorch Chatbot
⭐
300
Pytorch seq2seq chatbot
Easy_seq2seq
⭐
294
[unmaintained] go to https://github.com/suriyadeepan/practical_seq2seq
Pycantonese
⭐
290
Cantonese Linguistics and NLP
Kws
⭐
289
An End-to-End Architecture for Keyword Spotting and Voice Activity Detection
Github Typo Corpus
⭐
289
GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors
Deepspeech German
⭐
284
Automatic Speech Recognition (ASR) - German
Rel
⭐
279
REL: Radboud Entity Linker
Nlg Yongzhuo
⭐
273
中文文本生成(NLG)之文本摘要(text summarization)工具包, 语料数据(corpus data), 抽取式摘要 Extractive text summary of Lead3、keyword、textrank、text teaser、word significance、LDA、LSI、NMF。(graph,feature,topic model,summarize tool or tookit)
Voiceprintrecognition Tensorflow
⭐
273
使用Tensorflow实现声纹识别
Python Wordsegment
⭐
268
English word segmentation, written in pure-Python, and based on a trillion-word corpus.
Multi Criteria Cws
⭐
260
Simple Solution for Multi-Criteria Chinese Word Segmentation
Nsmc
⭐
259
Naver sentiment movie corpus
Corus
⭐
254
Links to Russian corpora + Python functions for loading and parsing
Abcnn
⭐
252
Implementation of ABCNN(Attention-Based Convolutional Neural Network) on Tensorflow
Elastik Nearest Neighbors
⭐
242
Go to: https://github.com/alexklibisz/elastiknn
Movietaster Open
⭐
241
A practical movie recommend project based on Item2vec.
Wikiplots
⭐
234
A dataset containing story plots from Wikipedia (books, movies, etc.) and the code for the extractor.
Mishkal
⭐
232
Mishkal is an arabic text vocalization software
Lotclass
⭐
231
[EMNLP 2020] Text Classification Using Label Names Only: A Language Model Self-Training Approach
Tacotron
⭐
226
Implementation of Google's Tacotron in TensorFlow
Convbert
⭐
222
Naturalcc
⭐
220
NaturalCC: An Open-Source Toolkit for Code Intelligence
Dynamic Nmf
⭐
219
Dynamic Topic Modeling via Non-negative Matrix Factorization
Causalityeventextraction
⭐
207
Causality event extraction demo project including casual patterns and experiment on large scale corpus. 基于因果关系知识库的因果事件图谱实验项目,本项目罗列了因果显式表达的几种模式,基于这种模式和大规模语
Tools_for_corpus_of_people_daily
⭐
200
人民日报语料处理工具集 | Tools for Corpus of People's Daily
Open Syllabus Project
⭐
199
What can be learned from 1M+ college course syllabi? (OLD)
Blue_benchmark
⭐
195
BLUE benchmark consists of five different biomedicine text-mining tasks with ten corpora.
Somiao Pinyin
⭐
194
Somiao Pinyin: Train your own Chinese Input Method with Seq2seq Model 搜喵拼音输入法
Context2vec
⭐
192
Syntaxnet
⭐
190
reference code for syntaxnet
Ptt Chat Generator
⭐
190
批踢踢推文產生器
Unify Emotion Datasets
⭐
189
A Survey and Experiments on Annotated Corpora for Emotion Classification in Text
Corpkit
⭐
189
A toolkit for corpus linguistics
Snli Entailment
⭐
181
attention model for entailment on SNLI corpus implemented in Tensorflow and Keras
Bi Lstm Crf
⭐
180
A PyTorch implementation of the BI-LSTM-CRF model.
Corpuscrawler
⭐
176
Crawler for linguistic corpora
Related Searches
Python Dataset (14,792)
Python Machine Learning (14,099)
Python Tensorflow (13,736)
Python Deep Learning (13,092)
Python Network (11,495)
Python Natural Language Processing (9,064)
Python Pytorch (7,877)
Python Neural (7,444)
Python Keras (6,821)
Python Paper (6,561)
1-100 of 1,068 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.