Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for dataset natural language processing
dataset
x
natural-language-processing
x
205 search results found
Datasets
⭐
18,319
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Doccano
⭐
8,927
Open source annotation tool for machine learning practitioners.
Nlp_chinese_corpus
⭐
8,344
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Awesome Pretrained Chinese Nlp Models
⭐
3,738
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
Text
⭐
3,411
Models, data loaders and abstractions for language processing, powered by PyTorch
Cluedatasetsearch
⭐
2,778
搜索所有中文NLP数据集,附常用英文NLP数据集
Textattack
⭐
2,597
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
Pytorch Nlp
⭐
2,180
Basic Utilities for PyTorch Natural Language Processing (NLP)
Codesearchnet
⭐
2,054
Datasets, tools, and benchmarks for representation learning of code.
Medical_nlp
⭐
1,969
Medical NLP Competition, dataset, large models, paper 医疗NLP领域 比赛,数据集,大模型,论文,工具包
Awesome_chinese_medical_nlp
⭐
1,847
中文医学NLP公开资源整理:术语集/语料库/词向量/预训练模型/知识图谱/命名实体识别/QA/信息抽
Chineseglue
⭐
1,765
Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard
Awesome Domain Llm
⭐
1,502
收集和梳理垂直领域的开源模型、数据集及评测基准。
Transfer Learning Conv Ai
⭐
1,499
🦄 State-of-the-Art Conversational AI with Transfer Learning
Deepmoji
⭐
1,462
State-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc.
Entity Recognition Datasets
⭐
1,386
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
Wikisql
⭐
1,370
A large annotated semantic parsing corpus for developing natural language interfaces.
Beir
⭐
1,332
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
Chinesenlp
⭐
1,329
Datasets, SOTA results of every fields of Chinese NLP
Dataprofiler
⭐
1,310
What's in your data? Extract schema, statistics and entities from datasets
Projects
⭐
1,207
🪐 End-to-end NLP workflows from prototype to production
Data Juicer
⭐
994
A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据!
Insuranceqa Corpus Zh
⭐
989
🚁 保险行业语料库,聊天机器人
Textbox
⭐
966
TextBox 2.0 is a text generation library with pre-trained language models
Chatgpt Comparison Detection
⭐
921
Human ChatGPT Comparison Corpus (HC3), Detectors, and more! 🔥
Torchmoji
⭐
882
😇A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc
K Bert
⭐
793
Source code of K-BERT (AAAI2020)
Chatito
⭐
755
🎯🗯 Generate datasets for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!
Prompt4reasoningpapers
⭐
717
Repository for the ACL2023 paper "Reasoning with Language Model Prompting: A Survey".
Hate Speech And Offensive Language
⭐
698
Repository for the paper "Automated Hate Speech Detection and the Problem of Offensive Language", ICWSM 2017
Thoughtsource
⭐
680
A central, open resource for data and tools related to chain-of-thought reasoning in large language models. Developed @ Samwald research group: https://samwald.info/
Long Range Arena
⭐
635
Long Range Arena for Benchmarking Efficient Transformers
Sequence Labeling Bilstm Crf
⭐
605
The classical BiLSTM-CRF model implemented in Tensorflow, for sequence labeling tasks. In Vex version, everything is configurable.
Datasets Server
⭐
578
Lightweight web API for visualizing and exploring all types of datasets - computer vision, speech, text, and tabular - stored on the Hugging Face Hub
Annotated Semantic Relationships Datasets
⭐
565
A collections of public and free annotated datasets of relationships between entities/nominals (Portuguese and English)
Neuspell
⭐
541
NeuSpell: A Neural Spelling Correction Toolkit
Cluecorpus2020
⭐
517
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
Efaqa Corpus Zh
⭐
505
❤️Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库
Complete Life Cycle Of A Data Science Project
⭐
499
Complete-Life-Cycle-of-a-Data-Science-Project
Indonlu
⭐
494
The first-ever vast natural language processing benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained IndoBERT models, and a starter code! (AACL-IJCNLP 2020)
Convokit
⭐
483
ConvoKit is a toolkit for extracting conversational features and analyzing social phenomena in conversations. It includes several large conversational datasets along with scripts exemplifying the use of the toolkit on these datasets.
Text2sql Data
⭐
478
A collection of datasets that pair questions with SQL queries.
Attention Networks For Classification
⭐
477
Hierarchical Attention Networks for Document Classification in PyTorch
Rnnlg
⭐
476
RNNLG is an open source benchmark toolkit for Natural Language Generation (NLG) in spoken dialogue system application domains. It is released by Tsung-Hsien (Shawn) Wen from Cambridge Dialogue Systems Group under Apache License 2.0.
Dstc8 Schema Guided Dialogue
⭐
464
The Schema-Guided Dialogue Dataset
Oie Resources
⭐
435
A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Subreddit Analyzer
⭐
422
A comprehensive Data and Text Mining workflow for submissions and comments from any given public subreddit.
Matterport3dsimulator
⭐
414
AI Research Platform for Reinforcement Learning from Real Panoramic Images.
Openai Clip
⭐
404
Simple implementation of OpenAI CLIP model in PyTorch.
Paperrobot
⭐
384
Code for PaperRobot: Incremental Draft Generation of Scientific Ideas
Chinese Nlp Corpus
⭐
378
Collections of Chinese NLP corpus
Awesomefakenews
⭐
317
This repository contains recent research on fake news.
Transformer Pointer Generator
⭐
314
A Abstractive Summarization Implementation with Transformer and Pointer-generator
Chakin
⭐
313
Simple downloader for pre-trained word vectors
Cmrc2018
⭐
313
A Span-Extraction Dataset for Chinese Machine Reading Comprehension (CMRC 2018)
Automated Fact Checking Resources
⭐
303
Links to conference/journal publications in automated fact-checking (resources for the TACL22/EMNLP23 paper).
Data Science Hacks
⭐
300
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Peerread
⭐
297
Data and code for Kang et al., NAACL 2018's paper titled "A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications"
Tner
⭐
296
Language model fine-tuning on NER with an easy interface and cross-domain evaluation. "T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition, EACL 2021"
Nlp_datasets
⭐
285
My NLP datasets for Russian language
Rc Cnn Dailymail
⭐
282
CNN/Daily Mail Reading Comprehension Task
Nsc
⭐
280
Neural Sentiment Classification
Medquad
⭐
275
Medical Question Answering Dataset of 47,457 QA pairs created from 12 NIH websites
Squirrel Core
⭐
271
A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way 🌰
Primekg
⭐
269
Precision Medicine Knowledge Graph (PrimeKG)
Nlp_bahasa_resources
⭐
260
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
Multi Criteria Cws
⭐
260
Simple Solution for Multi-Criteria Chinese Word Segmentation
Dialoglue
⭐
256
DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue
Corus
⭐
254
Links to Russian corpora + Python functions for loading and parsing
Ua Gec
⭐
246
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
Chazutsu
⭐
237
The tool to make NLP datasets ready to use
Persian Swear Words
⭐
235
Persian Swear Dataset - you can use in your production to filter unwanted content. دیتاست کلمات نامناسب و بد فارسی برای فیلتر کردن متن ها
Tensorflow_qrnn
⭐
228
QRNN implementation for TensorFlow
Nlp_profiler
⭐
227
A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.
Triviaqa
⭐
227
Code for the TriviaQA reading comprehension dataset
Torchnlp
⭐
221
Easy to use NLP library built on PyTorch and TorchText
Sota Extractor
⭐
221
The SOTA extractor pipeline
Aidl_kb
⭐
218
A Knowledge Base for the FB Group Artificial Intelligence and Deep Learning (AIDL)
Awesome Tensorlayer
⭐
212
A curated list of dedicated resources and applications
Neuralqa
⭐
207
NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT
Wrench
⭐
203
[NeurIPS 2021] WRENCH: Weak supeRvision bENCHmark
Dataset
⭐
194
darija <-> english dataset
Awesome Hungarian Nlp
⭐
192
A curated list of NLP resources for Hungarian
Unify Emotion Datasets
⭐
189
A Survey and Experiments on Annotated Corpora for Emotion Classification in Text
Goodreads
⭐
186
code samples for the goodreads datasets
Bert Attributeextraction
⭐
185
USING BERT FOR Attribute Extraction in KnowledgeGraph. fine-tuning and feature extraction. 使用基于bert的微调和特征提取方法来进行知识图谱百度百科人物词条属性抽取。
Text Summarization Repo
⭐
184
텍스트 요약 분야의 주요 연구 주제, Must-read Papers, 이용 가능한 model 및 data 등을 추천 자료와 함께 정리한 저장소입니다.
Fakenewscorpus
⭐
184
A dataset of millions of news articles scraped from a curated list of data sources.
Awesome Llm Eval
⭐
183
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, learderboard, papers, docs and models, mainly for Evaluation on LLMs.
Financial News Dataset
⭐
182
Reuters and Bloomberg
Robbert
⭐
180
A Dutch RoBERTa-based language model
Lineflow
⭐
178
⚡A Lightweight NLP Data Loader for All Deep Learning Frameworks in Python
Siamese Lstm
⭐
172
Siamese LSTM for evaluating semantic similarity between sentences of the Quora Question Pairs Dataset.
Nlp Public Dataset
⭐
172
Chinese, English NER, English-Chinese machine translation dataset. 中英文实体识别数据集,中英文机器翻译数据集, 中文分词数据集
Awesome Nlp Polish
⭐
169
A curated list of resources dedicated to Natural Language Processing (NLP) in polish. Models, tools, datasets.
Jury
⭐
167
Comprehensive NLP Evaluation System
Pubmed Rct
⭐
166
PubMed 200k RCT dataset: a large dataset for sequential sentence classification.
Trustllm
⭐
164
TrustLLM: Trustworthiness in Large Language Models
Scanrefer
⭐
163
[ECCV 2020] ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
Qb
⭐
160
QANTA Quiz Bowl AI
Related Searches
Python Dataset (15,297)
Python Natural Language Processing (7,915)
Jupyter Notebook Dataset (6,824)
Jupyter Notebook Natural Language Processing (4,405)
Machine Learning Natural Language Processing (3,939)
Deep Learning Natural Language Processing (2,414)
Machine Learning Dataset (2,395)
Deep Learning Dataset (2,364)
Dataset Pytorch (1,847)
Dataset Tensorflow (1,583)
1-100 of 205 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.