Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for natural language processing multilingual
multilingual
x
natural-language-processing
x
34 search results found
Polyglot
⭐
2,212
Multilingual text (NLP) processing toolkit
Elmoformanylangs
⭐
1,325
Pre-trained ELMo Representations for Many Languages
Contextualized Topic Models
⭐
1,141
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021.
Bpemb
⭐
1,068
Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)
Wit
⭐
896
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.
Detoxify
⭐
774
Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at
[email protected]
.
Trankit
⭐
693
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Beto
⭐
462
BETO - Spanish version of the BERT model
Autocorrect
⭐
376
Spelling corrector in python
Text2text
⭐
268
Text2Text: Crosslingual NLP/G toolkit
Multi_rake
⭐
249
Multilingual Rapid Automatic Keyword Extraction (RAKE) for Python
Speechtransprogress
⭐
218
Tracking the progress in end-to-end speech translation
Wn
⭐
170
A modern, interlingual wordnet interface for Python
Laserembeddings
⭐
163
LASER multilingual sentence embeddings as a pip package
Mgpt
⭐
162
Multilingual Generative Pretrained Model
Spacy Universal Sentence Encoder
⭐
156
Google USE (Universal Sentence Encoder) for spaCy
Multilingual_ner
⭐
120
Applying BERT to named entity recognition in English and Russian.
Mtdata
⭐
115
A tool that locates, downloads, and extracts machine translation corpora
Text
⭐
112
Using Transformers from HuggingFace in R
Fastrtext
⭐
97
R wrapper for fastText
Ml Mkqa
⭐
94
We introduce MKQA, an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). The goal of this dataset is to provide a challenging benchmark for question answering quality across a wide set of languages. Please refer to our paper for details, MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering
Lima
⭐
92
The Libre Multilingual Analyzer, a Natural Language Processing (NLP) C++ toolkit.
Rust Sbert
⭐
87
Rust port of sentence-transformers (https://github.com/UKPLab/sentence-transformers)
Crosslingual Nlp
⭐
83
This repo supports various cross-lingual transfer learning & multilingual NLP models.
Multilingual Latent Dirichlet Allocation Lda
⭐
73
A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.
Smaller Transformers
⭐
71
Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.
Tupa
⭐
67
Transition-based UCCA Parser
Nllb Serve
⭐
66
Meta's "No Language Left Behind" models served as web app and REST API
Glot500
⭐
65
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages (ACL'23)
Anuvaad
⭐
65
State of the art open-source translation for Indic languages.
Multilangstructurekd
⭐
64
[ACL 2020] Structure-Level Knowledge Distillation For Multilingual Sequence Labeling
Kwx
⭐
57
BERT, LDA, and TFIDF based keyword extraction in Python
Rakun2
⭐
56
RaKUn 2.0 - A fast keyword detection algorithm
Romanian Transformers
⭐
55
This repo is the home of Romanian Transformers.
Php Nlp Client
⭐
54
PHP Client for NLP Server
Glami 1m
⭐
47
The largest multilingual image-text classification dataset. It contains fashion products.
Masakhane Community
⭐
40
All our community docs! Start here! Lets put Africa on the NLP Map
Ewiser
⭐
40
A Word Sense Disambiguation system integrating implicit and explicit external knowledge.
Few Shot Lm
⭐
40
The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)
Okapi
⭐
36
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
Multilingual_nmt
⭐
36
Experiments on Multilingual NMT
Extractive_rc_by_runtime_mt
⭐
30
Code and datasets of "Multilingual Extractive Reading Comprehension by Runtime Machine Translation"
Termsuite Core
⭐
28
A Java UIMA-based toolbox for multilingual and efficient terminology extraction an multilingual term alignment
Tok Tok
⭐
26
A fast, simple, multilingual tokenizer
Multilingual Nlm
⭐
25
Code for "Unsupervised Multilingual Word Embedding with Limited Resources using Neural Language Models" and "Learning Contextualised Cross-lingual Word Embeddings and Alignments for Extremely Low-Resource Languages Using Parallel Corpora"
Mlmt
⭐
24
Code for the paper "A Multi-lingual Multi-task Architecture for Low-resource Sequence Labeling" (ACL2018)
Colxlm
⭐
23
Multilingual Retrieval on Yelp Search Engine ⚡
Geomm
⭐
23
Geometry-aware Multilingual Embeddings
Exams Qa
⭐
23
A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering
Timewords
⭐
23
Multilingual library to easily parse date strings to java.util.Date objects.
Turkish Question Generation
⭐
22
Automated question generation and question answering from Turkish texts using text-to-text transformers
Fullstop Deep Punctuation Prediction
⭐
21
A model that predicts the punctuation of English, Italian, French and German texts.
Multinerd
⭐
20
Repository for the paper "MultiNERD: A Multilingual, Multi-Genre and Fine-Grained Dataset for Named Entity Recognition (and Disambiguation)" (NAACL 2022).
Told Br
⭐
19
Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis
Wikirec
⭐
18
Recommendation engine framework based on Wikipedia data
Corpus_dataset_for_chinese_nlp
⭐
18
中文 NLP 语料库数据集
Polish Sentence Evaluation
⭐
16
Evaluation of Sentence Representations in Polish
Stsb Multi Mt
⭐
15
Machine translated multilingual STS benchmark dataset.
Character Eyes
⭐
14
🔤 👀 Seeing Language Through Character Level Taggers, BlackboxNLP 2019
Babelnet Sememe Prediction
⭐
13
Code and data of the AAAI-20 paper "Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets"
Mlt
⭐
12
Multilingual Neural Machine Translation using Transformers with Conditional Normalization.
Plmpapers
⭐
12
A paper list of pre-trained language models (PLMs).
Tokenizer
⭐
11
Natural Language Tokenizer
Swim Ir
⭐
11
SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 languages, generated using PaLM 2 and summarize-then-ask prompting.
Ikea Dataset
⭐
10
A dataset for multimodal machine translation
Langdist
⭐
10
Multilingual Language Modeling Toolkit
Supersummarizeai
⭐
8
Unleash the power of AI with SuperSummarizeAI! Effortlessly extract, condense, and clip content from webpages and YouTube videos using ChatGPT. Turning endless streams of content into digestible summaries.
Doc
⭐
7
Overarching documentation and planning to build so-called instruction-following large language models aka "ChatGPT-style" models for the European language area.
Acl20 Code Switching Patterns
⭐
7
Code-switching patterns can be an effective route to improve performance of downstream NLP applications: A case study of humour, sarcasm and hate speech detection
Mc2_corpus
⭐
7
MC^2: A Multilingual Corpus of Minority Languages in China (Tibetan, Uyghur, Kazakh, and Mongolian)
Inspiring_papers
⭐
6
Papers related to Machine Translation (continuously updating & welcome Star/Fork/PR)
Mol
⭐
5
Multilingual Offensive Lexicon consists of the first contextual lexicon for abusive language detection, which is composed of 1,000 explicit and implicit terms and expressions with any pejorative connotation annotated with contextual information
Texton
⭐
5
Text Tonsorium - a toolbox that automatically arranges NLP tools in workflows and enacts them with user's inputs
Unified_multilingual_dataset_of_emotional_human_utterances
⭐
5
A unified dataset of multilingual emotional human utterances
Webcorpus
⭐
5
Generate large textual corpora for almost any language by crawling the web
Semeval2022 Task8 Tonyx
⭐
5
Deep-learning system proposed by HFL for SemEval-2022 Task 8: Multilingual News Similarity
Related Searches
Python Natural Language Processing (7,915)
Jupyter Notebook Natural Language Processing (4,405)
Machine Learning Natural Language Processing (3,939)
Deep Learning Natural Language Processing (2,414)
Pytorch Natural Language Processing (1,212)
Artificial Intelligence Natural Language Processing (1,010)
Dataset Natural Language Processing (1,010)
Tensorflow Natural Language Processing (909)
Javascript Natural Language Processing (843)
Natural Language Processing Chatbot (726)
1-34 of 34 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.