Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for dataset multilingual
dataset
x
multilingual
x
38 search results found
The Pile
⭐
1,048
Multilingual_text_to_speech
⭐
740
An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.
Github Typo Corpus
⭐
289
GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors
Xl Sum
⭐
209
This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.
Mtdata
⭐
115
A tool that locates, downloads, and extracts machine translation corpora
Ml Mkqa
⭐
94
We introduce MKQA, an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). The goal of this dataset is to provide a challenging benchmark for question answering quality across a wide set of languages. Please refer to our paper for details, MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering
Mmner
⭐
69
Massively Multilingual Transfer for NER
Anuvaad
⭐
65
State of the art open-source translation for Indic languages.
Glot500
⭐
65
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages (ACL'23)
Miracl
⭐
61
A large-scale multilingual dataset for Information Retrieval. Thorough human-annotations across 18 diverse languages.
Data Preparation
⭐
59
Code used for sourcing and cleaning the BigScience ROOTS corpus
Conan
⭐
58
A repository with several curated datasets of counter-narratives to fight online hate speech.
Mlma_hate_speech
⭐
54
Dataset and code of our EMNLP 2019 paper "Multilingual and Multi-Aspect Hate Speech Analysis"
Xpersona
⭐
51
XPersona: Evaluating Multilingual Personalized Chatbot
Cross Language Dataset
⭐
50
A multilingual, multi-style and multi-granularity dataset for cross-language textual similarity detection
Glami 1m
⭐
47
The largest multilingual image-text classification dataset. It contains fashion products.
Xed
⭐
46
XED multilingual emotion datasets
Xcopa
⭐
42
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
Ewiser
⭐
40
A Word Sense Disambiguation system integrating implicit and explicit external knowledge.
Okapi
⭐
36
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
Extractive_rc_by_runtime_mt
⭐
30
Code and datasets of "Multilingual Extractive Reading Comprehension by Runtime Machine Translation"
Wikilingua
⭐
30
Multilingual abstractive summarization dataset extracted from WikiHow.
Fakecovid
⭐
29
FakeCovid- A Multilingual Cross-domain Fact Check News Dataset for COVID-19
Mlqe Pe
⭐
23
Multilingual Quality Estimation and Automatic Post-editing Dataset
Exams Qa
⭐
23
A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering
Meta Transfer Learning
⭐
22
Implementation of meta-transfer-learning (ACL 2020)
Ckanext Fluent
⭐
22
Multilingual fields for CKAN
Mass Dataset
⭐
21
MaSS - Multilingual corpus of Sentence-aligned Spoken utterances
Multinerd
⭐
20
Repository for the paper "MultiNERD: A Multilingual, Multi-Genre and Fine-Grained Dataset for Named Entity Recognition (and Disambiguation)" (NAACL 2022).
Told Br
⭐
19
Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis
Corpus_dataset_for_chinese_nlp
⭐
18
中文 NLP 语料库数据集
Stsb Multi Mt
⭐
15
Machine translated multilingual STS benchmark dataset.
Stable Diffusion Pokemon
⭐
14
A demo of fine tune Stable Diffusion on Pokemon-Blip-Captions in English, Japanese and Chinese Corpus
Toxic Comments Detection In Russian
⭐
13
Toxic Comments Detection in Russian.
Babelnet Sememe Prediction
⭐
13
Code and data of the AAAI-20 paper "Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets"
Swim Ir
⭐
11
SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 languages, generated using PaLM 2 and summarize-then-ask prompting.
Mmid
⭐
10
Words and their images in 98 languages
Ikea Dataset
⭐
10
A dataset for multimodal machine translation
Language Agnostic Contextualized Encoders
⭐
7
Korean Question Answer System
⭐
7
This is project to analyze korquad 2.0
Malevolent_dialogue
⭐
6
MDRDC dataset and used baselines
Any Language Frames
⭐
6
Multilingual datasets for the paper "Any-language frame-semantic parsing"
Xstance
⭐
6
A Multilingual Multi-Target Dataset for Stance Detection
M Amr2text
⭐
6
Generate from English-Centric AMR into Multiple Languages.
Unified_multilingual_dataset_of_emotional_human_utterances
⭐
5
A unified dataset of multilingual emotional human utterances
Webcorpus
⭐
5
Generate large textual corpora for almost any language by crawling the web
All About Speech
⭐
5
Related Searches
Python Dataset (14,793)
Jupyter Notebook Dataset (6,824)
Deep Learning Dataset (2,364)
Machine Learning Dataset (2,279)
Dataset Pytorch (1,847)
Dataset Tensorflow (1,583)
Dataset Classification (1,500)
Dataset Convolutional Neural Networks (1,264)
Dataset Paper (1,252)
Javascript Dataset (1,014)
1-38 of 38 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.