Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for dataset natural language processing
dataset
x
natural-language-processing
x
205 search results found
Urdu
⭐
50
Collection of Urdu datasets for POS, NER, and NLP tasks
Podium
⭐
50
Podium: a framework agnostic Python NLP library for data loading and preprocessing
Bugrepo
⭐
48
A collection of publicly available bug reports
Irc Disentanglement
⭐
48
Dataset and model for disentangling chat on IRC
Pytorch_basic_nmt
⭐
48
A simple yet strong implementation of neural machine translation in pytorch
Mtnt
⭐
48
Code for the collection and analysis of the MTNT dataset
Wongnai Corpus
⭐
47
Collection of Wongnai's datasets
Topic Rnn
⭐
47
Implementation (in progress) of Dieng et al.'s TopicRNN: a neural topic model & RNN hybrid.
Glami 1m
⭐
47
The largest multilingual image-text classification dataset. It contains fashion products.
Wiki Atomic Edits
⭐
47
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.
Trscraper
⭐
47
TRScraper, doğal dil işleme uygulamalarında kullanılmak amacıyla geliştirilmiş, Türkçe içerik girilen büyük platformlarda metin madenciliği yapma imkanı sunan bir uygulamadır.
Indonesian_datasets
⭐
46
NLP Datasets for Indonesian
Africanlp Public Datasets
⭐
46
A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.
Nodejs Stanford Classifier
⭐
46
Nodejs wrapper for Stanford Classifier.
Blonde
⭐
45
Official implementations for (1) BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation and (2) Discourse Centric Evaluation of Machine Translation with a Densely Annotated Parallel Corpus
Transformer Srl
⭐
45
Reimplementation of a BERT based model (Shi et al, 2019), currently the state-of-the-art for English SRL. This model implements also predicate disambiguation.
Awesome Chinese Llm
⭐
45
Awesome Chinese LLM: A curated list of Chinese Large Language Model 中文大语言模型数据集和模型资料汇总
Cccapsnet
⭐
44
A PyTorch implementation of Compositional Coding Capsule Network based on PRL 2022 paper "Compositional Coding Capsule Network with K-Means Routing for Text Classification"
Book Genre Classification
⭐
44
Classification of books based on titles without prior knowledge of context or author
Awesome Resources For Scholarly Big Data
⭐
44
Tools, datasets, Corpus and Venue Challenge for scholarly big data——Pick up scattered pearls
Edm
⭐
44
Python package for understanding the difficulty of text classification datasets. (in CoNNL 2018)
Ua Datasets
⭐
44
A collection of datasets for Ukrainian language
Text_nn
⭐
44
Text classification models. Used a submodule for other projects.
Wikineural
⭐
43
Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2021).
Synergy Dataset
⭐
43
SYNERGY - Open machine learning dataset on study selection in systematic reviews
Huggingartists
⭐
42
Lyrics generation with GPT2-based Transformer
Science Result Extractor
⭐
42
Machine Learning
⭐
41
This repository will contain all the stuffs required for beginners in ML and DL do follow and star this repo for regular updates
Odsqa
⭐
41
ODSQA: OPEN-DOMAIN SPOKEN QUESTION ANSWERING DATASET
Napolab
⭐
41
A Natural Portuguese Language Benchmark (Napolab) for the evaluation of language models.
Attention_is_all_you_need
⭐
41
Crest
⭐
40
A Causal Relation Schema for Text
Bibsample
⭐
40
Eample of using dataset api in tensorflow
Ewiser
⭐
40
A Word Sense Disambiguation system integrating implicit and explicit external knowledge.
Cdqa Annotator
⭐
40
⛔ [NOT MAINTAINED] A web-based annotator for closed-domain question answering datasets with SQuAD format.
Ai Sentiment Analysis On Imdb Dataset
⭐
40
Sentiment Analysis using Stochastic Gradient Descent on 50,000 Movie Reviews Compiled from the IMDB Dataset
C4 Dataset Script
⭐
39
Inspired by google c4, here is a series of colossal clean data cleaning scripts focused on CommonCrawl data processing. Including Chinese data processing and cleaning methods in MassiveText.
Finbert Qa
⭐
39
Financial Domain Question Answering with pre-trained BERT Language Model
Toefl Qa
⭐
39
A question answering dataset for machine comprehension of spoken content
Typenet
⭐
38
A Hierarchical Type system for fine grained entity typing
Wikiwhy
⭐
38
WikiWhy is a new benchmark for evaluating LLMs' ability to explain between cause-effect relationships. It is a QA dataset containing 9000+ "why" question-answer-rationale triplets.
Smiles X
⭐
38
Autonomous characterization of molecular compounds from small datasets without descriptors
Texting
⭐
37
Tensorflow implementation of ACL2020 paper "Every Document Owns Its Structure: Inductive Text Classification via Graph Neural Networks."
Pn Summary
⭐
37
A well-structured summarization dataset for the Persian language!
Fast Annotation Tool
⭐
37
FAST is an annotation tool that focuses on mobile devices.
Medmcqa
⭐
37
A large-scale (194k), Multiple-Choice Question Answering (MCQA) dataset designed to address realworld medical entrance exam questions.
Squirrel Datasets Core
⭐
37
Squirrel dataset hub
Yelp_challenge
⭐
37
Yelp dataset challenge: NLP & sentiment analysis
Datasets
⭐
36
A bunch of some 200 datasets. You can call it mini-kaggle :)
Pytorch Pqrnn
⭐
36
Implementation of pQRNN in PyTorch
Yenlp
⭐
36
NLP on Yelp's DataSet Challenge
Crnn Pytorch
⭐
36
✍️ Convolutional Recurrent Neural Network in Pytorch | Text Recognition
Datasetstation
⭐
36
快速下载中文数据集,处理数据集,数据分析、可视化分析,一站式解决数据问题
Okapi
⭐
36
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
Acharya
⭐
35
A Data Centric annotation tool for your Named Entity Recognition projects
Pqg Pytorch
⭐
35
Paraphrase Generation model using pair-wise discriminator loss
Focused Empathy
⭐
35
🤗 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"
10kgnad
⭐
34
Ten Thousand German News Articles Dataset for Topic Classification
Persianner
⭐
34
Named-Entity Recognition in Persian Language
Dialogue
⭐
34
Open_type
⭐
33
Chinesemrc Data
⭐
33
收集了目前为止中文领域的MRC抽取式数据集
Bothub
⭐
33
Bothub is an open platform for predicting, training and sharing NLP datasets in multiple languages
Codesearch
⭐
32
Models and datasets for annotated code search.
Wice
⭐
32
This repository contains the dataset and code for "WiCE: Real-World Entailment for Claims in Wikipedia" in EMNLP 2023.
Adept Augmentations
⭐
31
A Python library aimed at dissecting and augmenting NER training data.
Msmarco
⭐
31
Machine Comprehension Train on MSMARCO with S-NET Extraction Modification
Very Deep Cnn Pytorch
⭐
30
Very deep CNN for text classification
Dnc
⭐
30
Diverse Natural Language Inference Collection - NLI dataset that can used to evaluate how well models perform distinct types of reasoning (EMNLP 2018)
Sentence Autosegmentation
⭐
30
Deep-learning based sentence auto-segmentation from unstructured text w/o punctuation
Conwea
⭐
30
Code for the paper "Contextualized Weak Supervision for Text Classification"
Extractive_rc_by_runtime_mt
⭐
30
Code and datasets of "Multilingual Extractive Reading Comprehension by Runtime Machine Translation"
Bsard
⭐
30
⚖️ A Statutory Article Retrieval Dataset in French. (ACL 2022)
Tasknet
⭐
30
Easy multi-task learning with HuggingFace Datasets and Trainer
Biomedical Nlp Corpus
⭐
29
Corpus (datasets) collection about biology and medical NLP.
Dbrd
⭐
29
110k Dutch Book Reviews Dataset for Sentiment Analysis
Cfgen
⭐
29
Implementation of the EMNLP 2020 paper "Counterfactual Generator: A Weakly-Supervised Method for Named Entity Recognition".
Cnn Question Classification Keras
⭐
29
Chinese Question Classifier (Keras Implementation) on BQuLD
Num_fh
⭐
28
numeric fused-head identification and resolution
Naturallanguageprocessing
⭐
28
Natural Language Procesing
Sentence Classification Pytorch
⭐
28
Sentiment analysis with variable length sequences in pytorch
Surnames
⭐
28
Surnames dispersion around the world which sorted by population
Dureader_qanet_bidaf
⭐
27
Using QANet and BiDAF on DuReader datasets
Pytorch Transformer Kor Eng
⭐
27
Transformer Implementation using PyTorch for Neural Machine Translation (Korean to English)
Noisemix
⭐
27
NoiseMix - data generation for natural language
Vihos
⭐
26
Repository for the paper "ViHOS: Vietnamese Hate and Offensive Spans Detection" (EACL2023)
Smashed
⭐
26
SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batching, and more. Supports datasets from Huggingface, torchdata iterables, or simple lists of dictionaries.
Spoken Squad
⭐
26
A spoken question answering dataset on SQUAD
Moral_stories
⭐
26
Data and code for the "Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences" (Emelin et al., 2021) paper.
Findvehicle
⭐
26
FindVehicle: A NER dataset in transportation to extract keywords describing vehicles on the road
Cosmos
⭐
26
COSMOS: Catching Out-of-Context Misinformation using Self Supervised Learning (AAAI 2023)
Datasets
⭐
25
Collections of many datasets you may need and play with.
Awesome Azeri Nlp
⭐
24
Azerbaijani language processing software, models and datasets.
20 Newsgroups_text Classification
⭐
24
"20 newsgroups" dataset - Text Classification using Multinomial Naive Bayes in Python.
Pban Pytorch
⭐
24
A Position-aware Bidirectional Attention Network for Aspect-level Sentiment Analysis, PyTorch implementation.
Nlprep
⭐
24
🍳 NLPrep - dataset tool for many natural language processing task
Nlp_pemdc
⭐
23
NLP Predtrained Embeddings, Models and Datasets Collections(NLP_PEMDC). The collection will keep updating.
Exams Qa
⭐
23
A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering
Spirs
⭐
23
Sarcasm dataset, 15K tweets, very high quality, both intended & perceived sarcasm, rich context
Mp Cnn Variants
⭐
23
Variants of Multi-Perspective Convolutional Neural Networks
Related Searches
Python Dataset (15,297)
Python Natural Language Processing (7,915)
Jupyter Notebook Dataset (6,824)
Jupyter Notebook Natural Language Processing (4,405)
Machine Learning Natural Language Processing (3,939)
Deep Learning Natural Language Processing (2,414)
Machine Learning Dataset (2,395)
Deep Learning Dataset (2,364)
Dataset Pytorch (1,847)
Dataset Tensorflow (1,583)
201-205 of 205 search results
< Previous
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.