Awesome Open Source

Programming Languages

Search results for python corpus

1,068 search results found

Nltk ⭐ 12,699

Asrt_speechrecognition ⭐ 7,253

A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统

Bert Pytorch ⭐ 5,605

Google AI 2018 BERT pytorch implementation

Tensorflow Wavenet ⭐ 5,362

A TensorFlow implementation of DeepMind's WaveNet paper

Pycorrector ⭐ 4,928

pycorrector is a toolkit for text error correction. 文本纠错，实现了Kenlm，T5，MacBERT，ChatGLM3，LLaMA等模型应用在纠错场景，

Speech To Text Wavenet ⭐ 3,586

Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow

Chinese_chatbot_corpus ⭐ 3,550

中文公开聊天语料库

中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard

Markovify ⭐ 3,168

A simple, extensible Markov chain generator.

Deepqa ⭐ 2,878

My tensorflow implementation of "A neural conversational model", a Deep learning based chatbot

Uer Py ⭐ 2,802

Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo

Cluedatasetsearch ⭐ 2,778

搜索所有中文NLP数据集，附常用英文NLP数据集

Trafilatura ⭐ 2,447

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Weibo_terminater ⭐ 2,265

Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator

Awesome Chatbot ⭐ 1,977

Awesome Chatbot Projects,Corpus,Papers,Tutorials.Chinese Chatbot =>:

Gpt2 Ml ⭐ 1,674

GPT2 for Multiple Languages, including pretrained models. GPT2 多语言支持, 15亿参数中文预训练模型

Tensorflow 1.4 Billion Password Analysis ⭐ 1,657

Deep Learning model to analyze a large corpus of clear text passwords.

Single-document unsupervised keyword extraction

Dialog_corpus ⭐ 1,487

用于训练中英文对话系统的语料库 Datasets for Training Chatbot System

Rasa_nlu_chi ⭐ 1,466

Turn Chinese natural language into structured data 中文自然语言理解

Entity Recognition Datasets ⭐ 1,386

A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.

Rc Data ⭐ 1,221

Question answering dataset featured in "Teaching Machines to Read and Comprehend

Chatterbot Corpus ⭐ 1,219

A multilingual dialog corpus

Glove Python ⭐ 1,171

Toy Python implementation of http://www-nlp.stanford.edu/projects/glove/

Cdial Gpt ⭐ 944

A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models

Seq2seq Chatbot ⭐ 826

Chatbot in 200 lines of code using TensorLayer

Memn2n Tensorflow ⭐ 820

"End-To-End Memory Networks" in Tensorflow

The Classical Language Toolkit

Bookcorpus ⭐ 698

Crawl BookCorpus

Wordless ⭐ 649

An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation

Ngram2vec ⭐ 638

Four word embedding models implemented in Python. Supporting arbitrary context features

S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447

Ekphrasis ⭐ 583

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

Cpu_rec ⭐ 578

Recognize cpu instructions in an arbitrary binary file

Deep neural network framework for multi-label text classification

Bytenet ⭐ 570

A tensorflow implementation of French-to-English machine translation using DeepMind's ByteNet .

Text_renderer ⭐ 543

Bertweet ⭐ 542

BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)

A Visual Analysis Tool to Explore Learned Representations in Transformers Models

Multiturnresponseselection ⭐ 534

This repo contains our ACL 2017 paper data and source code

Ner Lstm ⭐ 528

Named Entity Recognition using multilayered bidirectional LSTM

中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark

Efaqa Corpus Zh ⭐ 505

❤️Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库

Korpora ⭐ 500

Korean corpus repository

Awesome Korean Nlp ⭐ 495

A curated list of resources for NLP (Natural Language Processing) for Korean

Gensim Data ⭐ 492

Data repository for pretrained NLP models and NLP corpora.

Ba Dls Deepspeech ⭐ 457

Markov chain text generator, as used for KingJamesProgramming

This repository contains source code for the TaBERT model, a pre-trained language model for learning joint representations of natural language utterances and (semi-)structured tables for semantic parsing. TaBERT is pre-trained on a massive corpus of 26M Web tables and their associated natural language context, and could be used as a drop-in replacement of a semantic parsers original encoder to compute representations for utterances and table schemas (columns).

Chinesewordsegmentation ⭐ 427

Chinese word segmentation algorithm without corpus（无需语料库的中文分词）

Undreamt ⭐ 421

Unsupervised Neural Machine Translation

Fact Extractor ⭐ 413

Fact Extraction from Wikipedia Text

This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, and word order information for the problem of paraphrase identification.

自然语言处理，知识图谱相关语料。按照Task细分，欢迎PR。

KdConv: A Chinese Multi-domain Dialogue Dataset Towards Multi-turn Knowledge-driven Conversation

Chinese Nlp Corpus ⭐ 378

Collections of Chinese NLP corpus

Afl Utils ⭐ 377

Utilities for automated crash sample processing/analysis, easy afl-fuzz job management and corpus optimization

Code for Applied Text Analysis with Python

Turkish Bert ⭐ 364

Turkish BERT/DistilBERT, ELECTRA and ConvBERT models

Chinesenlpcorpus ⭐ 362

An collection of Chinese nlp corpus including basic Chinese syntatic wordset, semantic wordset, historic corpus and evaluate corpus. 中文自然语言处理的语料集合，包括语义词、领域共时、历时语料库、评测语料库等。

Pykospacing ⭐ 348

Automatic Korean word spacing with Python

A Neural Framework for MT Evaluation

Deepcut ⭐ 319

A Thai word tokenization library using Deep Neural Network

Embedding ⭐ 309

한국어 임베딩 (Sentence Embeddings Using Korean Corpora)

Pytorch Chatbot ⭐ 300

Pytorch seq2seq chatbot

Easy_seq2seq ⭐ 294

[unmaintained] go to https://github.com/suriyadeepan/practical_seq2seq

Pycantonese ⭐ 290

Cantonese Linguistics and NLP

An End-to-End Architecture for Keyword Spotting and Voice Activity Detection

Github Typo Corpus ⭐ 289

GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors

Deepspeech German ⭐ 284

Automatic Speech Recognition (ASR) - German

REL: Radboud Entity Linker

Nlg Yongzhuo ⭐ 273

中文文本生成（NLG）之文本摘要（text summarization）工具包, 语料数据(corpus data), 抽取式摘要 Extractive text summary of Lead3、keyword、textrank、text teaser、word significance、LDA、LSI、NMF。（graph，feature，topic model，summarize tool or tookit）

Voiceprintrecognition Tensorflow ⭐ 273

使用Tensorflow实现声纹识别

Python Wordsegment ⭐ 268

English word segmentation, written in pure-Python, and based on a trillion-word corpus.

Multi Criteria Cws ⭐ 260

Simple Solution for Multi-Criteria Chinese Word Segmentation

Naver sentiment movie corpus

Links to Russian corpora + Python functions for loading and parsing

Implementation of ABCNN(Attention-Based Convolutional Neural Network) on Tensorflow

Elastik Nearest Neighbors ⭐ 242

Go to: https://github.com/alexklibisz/elastiknn

Movietaster Open ⭐ 241

A practical movie recommend project based on Item2vec.

Wikiplots ⭐ 234

A dataset containing story plots from Wikipedia (books, movies, etc.) and the code for the extractor.

Mishkal ⭐ 232

Mishkal is an arabic text vocalization software

Lotclass ⭐ 231

[EMNLP 2020] Text Classification Using Label Names Only: A Language Model Self-Training Approach

Tacotron ⭐ 226

Implementation of Google's Tacotron in TensorFlow

Convbert ⭐ 222

Naturalcc ⭐ 220

NaturalCC: An Open-Source Toolkit for Code Intelligence

Dynamic Nmf ⭐ 219

Dynamic Topic Modeling via Non-negative Matrix Factorization

Causalityeventextraction ⭐ 207

Causality event extraction demo project including casual patterns and experiment on large scale corpus. 基于因果关系知识库的因果事件图谱实验项目，本项目罗列了因果显式表达的几种模式，基于这种模式和大规模语

Tools_for_corpus_of_people_daily ⭐ 200

人民日报语料处理工具集 | Tools for Corpus of People's Daily

Open Syllabus Project ⭐ 199

What can be learned from 1M+ college course syllabi? (OLD)

Blue_benchmark ⭐ 195

BLUE benchmark consists of five different biomedicine text-mining tasks with ten corpora.

Somiao Pinyin ⭐ 194

Somiao Pinyin: Train your own Chinese Input Method with Seq2seq Model 搜喵拼音输入法

Context2vec ⭐ 192

Syntaxnet ⭐ 190

reference code for syntaxnet

Ptt Chat Generator ⭐ 190

批踢踢推文產生器

Unify Emotion Datasets ⭐ 189

A Survey and Experiments on Annotated Corpora for Emotion Classification in Text

Corpkit ⭐ 189

A toolkit for corpus linguistics

Snli Entailment ⭐ 181

attention model for entailment on SNLI corpus implemented in Tensorflow and Keras

Bi Lstm Crf ⭐ 180

A PyTorch implementation of the BI-LSTM-CRF model.

Corpuscrawler ⭐ 176

Crawler for linguistic corpora

Related Searches

Python Dataset (14,792)

Python Machine Learning (14,099)

Python Tensorflow (13,736)

Python Deep Learning (13,092)

Python Network (11,495)

Python Natural Language Processing (9,064)

Python Pytorch (7,877)

Python Neural (7,444)

Python Keras (6,821)

Python Paper (6,561)

1-100 of 1,068 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.