Awesome Open Source

Programming Languages

Search results for natural language processing wikipedia

natural-language-processing x

22 search results found

Sling ⭐ 1,873

SLING - A natural language frame semantics parser

Wikipedia2vec ⭐ 899

A tool for learning vector representations of words and entities from Wikipedia

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

Wordninja ⭐ 648

Probabilistically split concatenated words using NLP based on English Wikipedia unigram frequencies.

Simple downloader for pre-trained word vectors

Adam_qas ⭐ 298

ADAM - A Question Answering System. Inspired from IBM Watson

AraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models.

SpikeX - SpaCy Pipes for Knowledge Extraction

Nlp Data Augmentation ⭐ 215

Data Augmentation for NLP. NLP数据增强

A command-line toolkit to extract text content and category data from Wikipedia dump files

QANTA Quiz Bowl AI

SLING - A natural language frame semantics parser

Quantulum3 ⭐ 112

Library for unit extraction - fork of quantulum for python3

Wpcorpus ⭐ 98

wpcorpus - NLP corpus based on Wikipedia's full article dump

Quantulum ⭐ 92

Python library for information extraction of quantities from unstructured text

Doc2vec Api ⭐ 92

document embedding and machine learning script for beginners

Knowledge extraction from web data

An original implementation of EMNLP 2020, "AmbigQA: Answering Ambiguous Open-domain Questions"

Ja.text8 ⭐ 74

Japanese text8 corpus for word embedding.

Text Segmentation ⭐ 73

Implementation of the paper: Text Segmentation as a Supervised Learning Task

Wiki Split ⭐ 72

One million English sentences, each split into two sentences that together preserve the original meaning, extracted from Wikipedia edits.

Nlp Corpus ⭐ 65

varied english texts for modern NLP testing

Wiki Atomic Edits ⭐ 47

A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

CoDEx: A set of knowledge graph Completion Datasets Extracted from Wikidata and Wikipedia

Jawiki Kana Kanji Dict ⭐ 44

Generate SKK/MeCab dictionary from Wikipedia(Japanese edition)

Modern_chinese_nlp ⭐ 37

(WIP) My humble contribution to the democratization of the Chinese NLP technology

Mitie_chinese_wikipedia_corpus ⭐ 35

Pre-trained Wikipedia corpus by MITIE

Chinese Wikipedia Corpus Creator ⭐ 33

Corpus creator for Chinese Wikipedia

Arabic Tagger ⭐ 31

AQMAR Arabic Tagger: Sequence tagger with cost-augmented structured perceptron training

Contextuallstm ⭐ 26

Contextual LSTM for NLP tasks like word prediction and word embedding creation for Deep Learning

Odia Nlp Resource Catalog ⭐ 26

Ml You Can Use ⭐ 24

Practical ML and NLP with examples.

Python Bot using RASA for NLP

Arabic Word Embeddings Word2vec ⭐ 19

Arabic Word Embeddings Word2vec

Recommendation engine framework based on Wikipedia data

Politbert ⭐ 18

Polish RoBERTA model trained on Polish literature, Wikipedia, and Oscar. The major assumption is that quality text will give a good model.

Gpt2_episode_summary_generator ⭐ 18

Utilizing webscraping and state-of-the-art NLP to generate TV show episode summaries.

Word2vec Wikification Py ⭐ 16

Disambiguation of wikipedia article name

Text Summarization ⭐ 15

Using Spacy and NLTK module with Tf-Idf algorithm for text-summarisation. This code will give you the summary of inputted article. You can input text directly or from .txt file, .pdf file or from wikipedia url.

German Wikipedia Text Corpus ⭐ 15

This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings like fastText or ELMo Deep contextualized word representations.

Infotabs Code ⭐ 14

Implementation of the semi-structured inference model in our ACL 2020 paper, INFOTABS: Inference on Tables as Semi-structured Data.

Pyconhk2015 Chinese Nlp ⭐ 14

Materials for the talk on Chinese NLP at PyCon HK 2015

German2vec ⭐ 13

Language Model and Text Classification for German Language using Deep Learning

Reading the data from OPIEC - an Open Information Extraction corpus

English Lemmer interface for Node.js

Knowledge_infotabs ⭐ 11

Repository containing code for the NAACL 2021 paper (Incorporating External Knowledge to Enhance Tabular Reasoning)

Bilingualcorpus ⭐ 11

Wiki Dump Reader ⭐ 10

Extract corpora from Wikipedia dumps

A sytem for Named Entity Disambiguation based on Random Walks and Learning to Rank.

Entity_knowledge_in_bert ⭐ 10

This repository contains the code for the CONLL 2019 paper "Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking". The code is provided as a documentation for the paper and also for follow-up research.

An NLP framework for large scale processing using Hadoop

Fool Me Twice ⭐ 9

Game code and data for Fool Me Twice: Entailment from Wikipedia Gamification https://arxiv.org/abs/2104.04725

Entitypedia ⭐ 9

Entitypedia is an Extended Named Entity Dictionary from Wikipedia.

Text Vectorian ⭐ 9

Wiki Text Nlp ⭐ 8

Extract 'Did you know?' facts from Wikipedia articles

11411 Project ⭐ 8

11-411 NLP Project: Wikipedia Article Q&A System

The universal integrated corpus-building environment.

Minimal parser for Wikipedia pages with zero dependencies

Opiec Pipeline ⭐ 7

Wikipedia2corpus ⭐ 7

Wikipedia text corpus for self-supervised NLP model training

Language_model_tf ⭐ 6

Language Model in Tensorflow

Wikiloader ⭐ 6

A package to download and preprocess a Wikipedia dump, in any language.

Rosettepedia ⭐ 6

Augment Rosette API entity extraction results with information from Wikipedia.

Wikitextcorpusdownloader ⭐ 6

A Language Independent Wikipedia Text Corpus Downloader

Language rules for Persian texts

Tf Similar Sentences ⭐ 6

Find similar sentences using Tensorflow Hub for English Wikipedia

Wikitrivia ⭐ 5

A trivia game based on NLP-extracted Wikipedia questions

A1 Summit ⭐ 5

An All-in-1 summarizer for your news articles, blogs, YouTube videos, study materials, Wikipedia content, etc.

Nepali Nlp Resources ⭐ 5

Resources for Nepali Natural Language Processing

Related Searches

Python Natural Language Processing (7,915)

Jupyter Notebook Natural Language Processing (4,405)

Machine Learning Natural Language Processing (3,939)

Deep Learning Natural Language Processing (2,414)

Python Wikipedia (1,264)

Pytorch Natural Language Processing (1,212)

Artificial Intelligence Natural Language Processing (1,010)

Dataset Natural Language Processing (1,010)

Tensorflow Natural Language Processing (909)

Javascript Natural Language Processing (843)

1-22 of 22 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.