Awesome Open Source

Programming Languages

Search results for natural language processing multilingual

natural-language-processing x

34 search results found

Polyglot ⭐ 2,212

Multilingual text (NLP) processing toolkit

Elmoformanylangs ⭐ 1,325

Pre-trained ELMo Representations for Many Languages

Contextualized Topic Models ⭐ 1,141

A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021.

Bpemb ⭐ 1,068

Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

Detoxify ⭐ 774

Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at [email protected].

Trankit ⭐ 693

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

BETO - Spanish version of the BERT model

Autocorrect ⭐ 376

Spelling corrector in python

Text2text ⭐ 268

Text2Text: Crosslingual NLP/G toolkit

Multi_rake ⭐ 249

Multilingual Rapid Automatic Keyword Extraction (RAKE) for Python

Speechtransprogress ⭐ 218

Tracking the progress in end-to-end speech translation

A modern, interlingual wordnet interface for Python

Laserembeddings ⭐ 163

LASER multilingual sentence embeddings as a pip package

Multilingual Generative Pretrained Model

Spacy Universal Sentence Encoder ⭐ 156

Google USE (Universal Sentence Encoder) for spaCy

Multilingual_ner ⭐ 120

Applying BERT to named entity recognition in English and Russian.

A tool that locates, downloads, and extracts machine translation corpora

Using Transformers from HuggingFace in R

Fastrtext ⭐ 97

R wrapper for fastText

We introduce MKQA, an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). The goal of this dataset is to provide a challenging benchmark for question answering quality across a wide set of languages. Please refer to our paper for details, MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering

The Libre Multilingual Analyzer, a Natural Language Processing (NLP) C++ toolkit.

Rust Sbert ⭐ 87

Rust port of sentence-transformers (https://github.com/UKPLab/sentence-transformers)

Crosslingual Nlp ⭐ 83

This repo supports various cross-lingual transfer learning & multilingual NLP models.

Multilingual Latent Dirichlet Allocation Lda ⭐ 73

A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.

Smaller Transformers ⭐ 71

Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.

Transition-based UCCA Parser

Nllb Serve ⭐ 66

Meta's "No Language Left Behind" models served as web app and REST API

Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages (ACL'23)

State of the art open-source translation for Indic languages.

Multilangstructurekd ⭐ 64

[ACL 2020] Structure-Level Knowledge Distillation For Multilingual Sequence Labeling

BERT, LDA, and TFIDF based keyword extraction in Python

RaKUn 2.0 - A fast keyword detection algorithm

Romanian Transformers ⭐ 55

This repo is the home of Romanian Transformers.

Php Nlp Client ⭐ 54

PHP Client for NLP Server

Glami 1m ⭐ 47

The largest multilingual image-text classification dataset. It contains fashion products.

Masakhane Community ⭐ 40

All our community docs! Start here! Lets put Africa on the NLP Map

A Word Sense Disambiguation system integrating implicit and explicit external knowledge.

Few Shot Lm ⭐ 40

The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)

Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

Multilingual_nmt ⭐ 36

Experiments on Multilingual NMT

Extractive_rc_by_runtime_mt ⭐ 30

Code and datasets of "Multilingual Extractive Reading Comprehension by Runtime Machine Translation"

Termsuite Core ⭐ 28

A Java UIMA-based toolbox for multilingual and efficient terminology extraction an multilingual term alignment

A fast, simple, multilingual tokenizer

Multilingual Nlm ⭐ 25

Code for "Unsupervised Multilingual Word Embedding with Limited Resources using Neural Language Models" and "Learning Contextualised Cross-lingual Word Embeddings and Alignments for Extremely Low-Resource Languages Using Parallel Corpora"

Code for the paper "A Multi-lingual Multi-task Architecture for Low-resource Sequence Labeling" (ACL2018)

Multilingual Retrieval on Yelp Search Engine ⚡

Geometry-aware Multilingual Embeddings

Exams Qa ⭐ 23

A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering

Timewords ⭐ 23

Multilingual library to easily parse date strings to java.util.Date objects.

Turkish Question Generation ⭐ 22

Automated question generation and question answering from Turkish texts using text-to-text transformers

Fullstop Deep Punctuation Prediction ⭐ 21

A model that predicts the punctuation of English, Italian, French and German texts.

Multinerd ⭐ 20

Repository for the paper "MultiNERD: A Multilingual, Multi-Genre and Fine-Grained Dataset for Named Entity Recognition (and Disambiguation)" (NAACL 2022).

Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis

Recommendation engine framework based on Wikipedia data

Corpus_dataset_for_chinese_nlp ⭐ 18

中文 NLP 语料库数据集

Polish Sentence Evaluation ⭐ 16

Evaluation of Sentence Representations in Polish

Stsb Multi Mt ⭐ 15

Machine translated multilingual STS benchmark dataset.

Character Eyes ⭐ 14

🔤 👀 Seeing Language Through Character Level Taggers, BlackboxNLP 2019

Babelnet Sememe Prediction ⭐ 13

Code and data of the AAAI-20 paper "Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets"

Multilingual Neural Machine Translation using Transformers with Conditional Normalization.

Plmpapers ⭐ 12

A paper list of pre-trained language models (PLMs).

Tokenizer ⭐ 11

Natural Language Tokenizer

SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 languages, generated using PaLM 2 and summarize-then-ask prompting.

Ikea Dataset ⭐ 10

A dataset for multimodal machine translation

Langdist ⭐ 10

Multilingual Language Modeling Toolkit

Supersummarizeai ⭐ 8

Unleash the power of AI with SuperSummarizeAI! Effortlessly extract, condense, and clip content from webpages and YouTube videos using ChatGPT. Turning endless streams of content into digestible summaries.

Overarching documentation and planning to build so-called instruction-following large language models aka "ChatGPT-style" models for the European language area.

Acl20 Code Switching Patterns ⭐ 7

Code-switching patterns can be an effective route to improve performance of downstream NLP applications: A case study of humour, sarcasm and hate speech detection

Mc2_corpus ⭐ 7

MC^2: A Multilingual Corpus of Minority Languages in China (Tibetan, Uyghur, Kazakh, and Mongolian)

Inspiring_papers ⭐ 6

Papers related to Machine Translation (continuously updating & welcome Star/Fork/PR)

Multilingual Offensive Lexicon consists of the first contextual lexicon for abusive language detection, which is composed of 1,000 explicit and implicit terms and expressions with any pejorative connotation annotated with contextual information

Text Tonsorium - a toolbox that automatically arranges NLP tools in workflows and enacts them with user's inputs

Unified_multilingual_dataset_of_emotional_human_utterances ⭐ 5

A unified dataset of multilingual emotional human utterances

Webcorpus ⭐ 5

Generate large textual corpora for almost any language by crawling the web

Semeval2022 Task8 Tonyx ⭐ 5

Deep-learning system proposed by HFL for SemEval-2022 Task 8: Multilingual News Similarity

Related Searches

Python Natural Language Processing (7,915)

Jupyter Notebook Natural Language Processing (4,405)

Machine Learning Natural Language Processing (3,939)

Deep Learning Natural Language Processing (2,414)

Pytorch Natural Language Processing (1,212)

Artificial Intelligence Natural Language Processing (1,010)

Dataset Natural Language Processing (1,010)

Tensorflow Natural Language Processing (909)

Javascript Natural Language Processing (843)

Natural Language Processing Chatbot (726)

1-34 of 34 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.