Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for dataset natural language processing
dataset
x
natural-language-processing
x
205 search results found
Dialogsum
⭐
153
DialogSum: A Real-life Scenario Dialogue Summarization Dataset - Findings of ACL 2021
Awesome Ukrainian Nlp
⭐
146
Curated list of Ukrainian natural language processing (NLP) resources (corpora, pretrained models, libriaries, etc.)
Danlp
⭐
141
DaNLP is a repository for Natural Language Processing resources for the Danish Language.
Emotion_dataset
⭐
140
😄 Dataset for Emotion Classification
Navec
⭐
140
Compact high quality word embeddings for Russian language
Summarus
⭐
140
Models for automatic abstractive summarization
Machine Learning Resources
⭐
137
A curated list of awesome machine learning frameworks, libraries, courses, books and many more.
Commongen
⭐
136
A Constrained Text Generation Challenge Towards Generative Commonsense Reasoning
Existing Medical Qa Datasets
⭐
135
Multimodal Question Answering in the Medical Domain: A summary of Existing Datasets and Systems
Twitter Sentiment Cnn
⭐
133
An implementation in TensorFlow of a convolutional neural network (CNN) to perform sentiment classification on tweets.
Swagaf
⭐
133
Repository for paper "SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference"
Pre Modern_chinese_corpus_dataset
⭐
132
近代汉语语料库数据集 自然语言处理 语料库 古代汉语 古汉语 文言文 数字人文 计算语言
Thermostat
⭐
131
Collection of NLP model explanations and accompanying analysis tools
Chatgpt Retrievalqa
⭐
130
A dataset for training/evaluating Question Answering Retrieval models on ChatGPT responses with the possibility to training/evaluating on real human responses.
Mongolian Nlp
⭐
126
Useful resources for Mongolian NLP
Chariot
⭐
121
Deliver the ready-to-train data to your NLP model.
Lecturebank
⭐
121
LectureBank Dataset
Neusum
⭐
118
Code for the ACL 2018 paper "Neural Document Summarization by Jointly Learning to Score and Select Sentences"
Open Korean Corpora
⭐
117
Open Korean NLP Dataset Curation for the Users All Around the Globe
Hierarchical Attention Network
⭐
117
Implementation of Hierarchical Attention Networks in PyTorch
Awesome Llm Human Preference Datasets
⭐
116
A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.
Mol Instructions
⭐
116
Mol-Instructions is a Large-Scale Biomolecules Instruction Dataset for Large Language Models.
Active Nlp
⭐
116
Bayesian Deep Active Learning for Natural Language Processing Tasks
Mtdata
⭐
115
A tool that locates, downloads, and extracts machine translation corpora
Bond
⭐
114
BOND: BERT-Assisted Open-Domain Name Entity Recognition with Distant Supervision
Fnc 1 Baseline
⭐
113
A baseline implementation for FNC-1
Detecting Scientific Claim
⭐
111
Extracting scientific claims from biomedical abstracts (powered by AllenNLP), demo:
Clicr
⭐
108
Machine reading comprehension on clinical case reports
Ask2transformers
⭐
107
A Framework for Textual Entailment based Zero Shot text classification
Bertqa Attention On Steroids
⭐
105
BertQA - Attention on Steroids
Falcon2.0
⭐
104
Falcon 2.0 is a joint entity and relation linking tool over Wikidata.
Prosody
⭐
104
Helsinki Prosody Corpus and A System for Predicting Prosodic Prominence from Text
Recon
⭐
102
Recon NER, Debug and correct annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality of your data.
Cnn Text Classification
⭐
101
Text classification with Convolution Neural Networks on Yelp, IMDB & sentence polarity dataset v1.0
Financialdatasets
⭐
100
SmoothNLP 金融文本数据集(公开) Public Financial Datasets for NLP Researches Only
Indonesian Nlp Resources
⭐
98
data resource untuk NLP bahasa indonesia
Ml Mkqa
⭐
94
We introduce MKQA, an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). The goal of this dataset is to provide a challenging benchmark for question answering quality across a wide set of languages. Please refer to our paper for details, MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering
Korean Hate Speech
⭐
93
Korean HateSpeech Dataset
Pytorch_gbw_lm
⭐
90
PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset
Enso
⭐
89
Enso: An Open Source Library for Benchmarking Embeddings + Transfer Learning Methods
Awesome_multimodel_llm
⭐
89
Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.
Ambigqa
⭐
86
An original implementation of EMNLP 2020, "AmbigQA: Answering Ambiguous Open-domain Questions"
Sentiment
⭐
85
An example project using a feed-forward neural network for text sentiment classification trained with 25,000 movie reviews from the IMDB website.
Phrase At Scale
⭐
84
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Persianqa
⭐
84
Persian (Farsi) Question Answering Dataset (+ Models)
Pytreebank
⭐
83
😡😇 Stanford Sentiment Treebank loader in Python
Fastlorachat
⭐
83
Instruct-tune LLaMA on consumer hardware with shareGPT data
Dialogue Understanding
⭐
82
This repository contains PyTorch implementation for the baseline models from the paper Utterance-level Dialogue Understanding: An Empirical Study
Chabsa Dataset
⭐
82
chakki's Aspect-Based Sentiment Analysis dataset
Marathinlp
⭐
80
Marathi NLP - is a repository dedicated to development of tools and resources for Marathi language.
Kobert Ner
⭐
79
NER Task with KoBERT (with Naver NLP Challenge dataset)
Grailqa
⭐
78
Nlp Models
⭐
77
NLP research experiments, built on PyTorch within the AllenNLP framework.
Doccano Client
⭐
76
A simple client for doccano API.
Canrevan
⭐
75
대량의 네이버 뉴스 기사를 수집하는 라이브러리입니다.
Text Segmentation
⭐
73
Implementation of the paper: Text Segmentation as a Supervised Learning Task
Writing Editing Network
⭐
72
Code for Paper Abstract Writing through Editing Mechanism
Mams For Absa
⭐
72
A Multi-Aspect Multi-Sentiment Dataset for aspect-based sentiment analysis.
Wiki Split
⭐
72
One million English sentences, each split into two sentences that together preserve the original meaning, extracted from Wikipedia edits.
News Headlines Dataset For Sarcasm Detection
⭐
68
High quality dataset for the task of Sarcasm Detection
Turkish Nlp Resources
⭐
67
🔡 List of Tools, Libraries, Models, Datasets and other resources for Turkish NLP.
Name2nat
⭐
66
name2nat: a Python package for nationality prediction from a name
Xfund
⭐
66
XFUND: A Multilingual Form Understanding Benchmark
Farstail
⭐
66
FarsTail: a Persian natural language inference dataset
Danes
⭐
65
DANeS is an open-source E-newspaper dataset by collaboration between DATASET JSC (dataset.vn) and AIV Group (aivgroup.vn)
Glot500
⭐
65
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages (ACL'23)
Crd3
⭐
65
The repo containing the Critical Role Dungeons and Dragons Dataset.
Anuvaad
⭐
65
State of the art open-source translation for Indic languages.
Indonlg
⭐
64
The first-ever vast natural language generation benchmark for Indonesian, Sundanese, and Javanese. We provide multiple downstream tasks, pre-trained IndoGPT and IndoBART models, and a starter code! (EMNLP 2021)
Gener
⭐
64
Simple Questions Generate Named Entity Recognition Datasets (EMNLP 2022)
N3 Collection
⭐
63
N3 - A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format
Query Wellformedness
⭐
63
25,100 queries from the Paralex corpus (Fader et al., 2013) annotated with human ratings of whether they are well-formed natural language questions.
Vtuber Livechat Dataset
⭐
63
📊 VTuber 1B: Billion-scale Live Chat and Moderation Event Dataset
Reclor
⭐
61
Code for "ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning" (ICLR 2020)
Rc Experiments
⭐
61
Reading Comprehension Experiments repository.
Papers
⭐
60
Curated repository of notes from papers I'm reading, mostly NLP related. Updated regularly.
Shakkelha
⭐
59
Neural Arabic text diacritization
Awesome Nlp Chinese Corpus
⭐
59
A curated list of resources of chinese corpora for NLP(Natural Language Processing)
Expmrc
⭐
59
ExpMRC: Explainability Evaluation for Machine Reading Comprehension
Discovery
⭐
59
Mining Discourse Markers for Unsupervised Sentence Representation Learning
Char Rnn Tensorflow
⭐
58
Multi-layer Recurrent Neural Networks for character-level language models implements by TensorFlow
Video_music_book_datasets
⭐
57
NLP NER datasets video/music/book bio
Deep Semantic Code Search
⭐
57
Deep Semantic Code Search aims to explore a joint embedding space for code and description vectors and then use it for a code search application
Ake Datasets
⭐
57
Large, curated set of benchmark datasets for evaluating automatic keyphrase extraction algorithms.
Mkg_analogy
⭐
56
Code and datasets for the ICLR2023 paper "Multimodal Analogical Reasoning over Knowledge Graphs."
Cord19q
⭐
56
COVID-19 Open Research Dataset (CORD-19) Analysis
Doccano Transformer
⭐
55
The official tool for transforming doccano format into common dataset formats.
Factchd
⭐
54
Code for the paper "FactCHD: Benchmarking Fact-Conflicting Hallucination Detection".
Nlp Datasets
⭐
54
Curation note of NLP datasets
Distractor Generation Race
⭐
54
[AAAI 2019] Generating Distractors for Reading Comprehension Questions from Real Examinations
Chinese_book_dataset
⭐
54
中文图书数据集/数据挖掘/自然语言处理/中国图书分类法/图书情报学/数据挖掘/文本分类/
Coarij
⭐
54
Corpus of Annual Reports in Japan
Prosocial Dialog
⭐
53
🐥 Code and Dataset for our EMNLP 2022 paper - "ProsocialDialog: A Prosocial Backbone for Conversational Agents"
Arekit
⭐
52
Document level Attitude and Relation Extraction toolkit (AREkit) for sampling and prompting mass-media news into datasets for ML-model training
Text Style Transfer Benchmark
⭐
52
Text style transfer benchmark
Text Mined Synthesis_public
⭐
52
Codes for text-mined solid-state reactions dataset
Indic.page
⭐
52
A directory of Indic (Indian) language computing resources.
Causalnewscorpus
⭐
51
Participate in our Shared Task: Event Causality Identification with Causal News Corpus, featured under CASE @ RANLP 2023!
Tamil Nlp Catalog
⭐
51
Awesome List of Tamil NLP & AI Resources
Snorkeling
⭐
51
Extracting biomedical relationships from literature with Snorkel 🏊
Related Searches
Python Dataset (15,297)
Python Natural Language Processing (7,915)
Jupyter Notebook Dataset (6,824)
Jupyter Notebook Natural Language Processing (4,405)
Machine Learning Natural Language Processing (3,939)
Deep Learning Natural Language Processing (2,414)
Machine Learning Dataset (2,395)
Deep Learning Dataset (2,364)
Dataset Pytorch (1,847)
Dataset Tensorflow (1,583)
101-200 of 205 search results
< Previous
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.