Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for low resource languages
low-resource-languages
x
30 search results found
Low Resource Languages
⭐
373
Resources for conservation, development, and documentation of low resource (human) languages.
Xl Sum
⭐
209
This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.
Banglanmt
⭐
132
This repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.
Africanlp Public Datasets
⭐
46
A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.
Glotlid
⭐
35
Language Identification tool for more than 1600 languages (EMNLP 2023).
Semi Supervised Nmt For Sumerian English
⭐
29
Exploring the Limits of Low-Resource Neural Machine Translation
Cognet
⭐
27
CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates
Relm_unmt
⭐
26
Python source code for EMNLP 2020 paper "Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT".
Calamancy
⭐
26
NLP pipelines for Tagalog using spaCy
Bembaspeech
⭐
25
This is an ASR corpus for Bemba language. It contains read speech from diverse publicly available Bemba sources; Literature Books, Radio/TV shows transcripts, Youtube Video transcripts, Online sources. The corpus has 14, 438 utterances culminating into over 24 hours of speech.
Turkish Text To Speech
⭐
22
Speech synthesis (TTS) in low-resource languages by training from scratch with Fastpitch and fine-tuning with HifiGan
Indian_parallelcorpus
⭐
20
Curated list of publicly available parallel corpus for Indian Languages
Thesis
⭐
20
My thesis on "Open Source Code and Low Resource Languages" for an MSc in Language Science and Technology at Saarland University
Naijasenti
⭐
17
This is a repository for NaijaSenti. A Lacuna Funded Project for the development of sentiment corpus for four Nigerian languages: Igbo, Hausa, Yoruba and Pidgin.
Turkish Speech To Text
⭐
13
Fine-tuning for automatic speech recognition on low-resource languages with character-based CTC model
Filipino Text Benchmarks
⭐
13
Open-source benchmark datasets and pretrained transformer models in the Filipino language.
Entitytargetedactivelearning
⭐
12
Vad Sli Asr
⭐
11
A pipeline to isolate and transcribe one language in mixed-language speech
Zambezi Voice
⭐
10
Repository for multilingual speech data resources for native languages of Zambia.
Interlingual Mfa
⭐
10
Workflow for forced alignment between languages
Coppermt
⭐
8
Cognate Prediction Per Machine Translation - Code of the ACL 2021 Findings Paper : Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task?
Mc2_corpus
⭐
7
MC^2: A Multilingual Corpus of Minority Languages in China (Tibetan, Uyghur, Kazakh, and Mongolian)
Bilatticernn Confidence
⭐
7
Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks https://arxiv.org/abs/1910.11933 or https://ieeexplore.ieee.org/document/9053264
Openspeaks Before Ai
⭐
7
A set of frameworks for creating the AI/ML building blocks for low-resource languages.
Verdd
⭐
6
Veʹrdd is an open-source dictionary editing framework with the focus on low-resourced and endangered languages. The framework is mainly built to facilitate collecting, importing, editing and exporting dictionaries while allowing the involvement of the native speakers to contribute easily to the preservation of the language and construction of the dictionary.
Tagalog Fake News
⭐
5
Fake news detection in Filipino via Multitask Transfer Learning
Swahili Text Gcn
⭐
5
Graph Convolutional Network for Swahili News Classification: https://arxiv.org/abs/2103.09325
Josa Corpus
⭐
5
Jopara (Guarani-dominant mixed with Spanish) sentiment analysis corpus
Minangnlp
⭐
5
Minangkabau NLP corpus. PACLIC 2020
Nertransfer
⭐
5
[IJCNLP-AACL 2023] Investigating transfer learning in low-resourced languages, specifically in a named entity recognition (NER) task. http://arxiv.org/abs/2309.05311
Related Searches
Natural Language Processing Low Resource Languages (25)
Python Low Resource Languages (15)
1-30 of 30 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.