Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for corpus wikipedia
corpus
x
wikipedia
x
66 search results found
Wiki2vec
⭐
587
Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby
Gensim Data
⭐
492
Data repository for pretrained NLP models and NLP corpora.
Fact Extractor
⭐
413
Fact Extraction from Wikipedia Text
Rel
⭐
279
REL: Radboud Entity Linker
Wikiplots
⭐
234
A dataset containing story plots from Wikipedia (books, movies, etc.) and the code for the extractor.
Fasttextjapanesetutorial
⭐
174
Tutorial to train fastText with Japanese corpus
Wp2txt
⭐
160
A command-line toolkit to extract text content and category data from Wikipedia dump files
Pignlproc
⭐
160
Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.
Wiki2text
⭐
129
Extract a plain text corpus from MediaWiki XML dumps, such as Wikipedia.
Kanji Frequency
⭐
116
Kanji usage frequency data collected from various sources
Wpcorpus
⭐
98
wpcorpus - NLP corpus based on Wikipedia's full article dump
Dict2vec
⭐
88
Dict2vec is a framework to learn word embeddings using lexical dictionaries.
Ja.text8
⭐
74
Japanese text8 corpus for word embedding.
Wiki Word2vec
⭐
66
Train a gensim word2vec model on Wikipedia.
Nlp Corpus
⭐
65
varied english texts for modern NLP testing
Wikiedits
⭐
61
Automatic extraction of edited sentences from text edition histories.
Textgrounder
⭐
60
A system for connecting language to space and time.
Japanese Words To Vectors
⭐
57
Word2vec (word to vectors) approach for Japanese language using Gensim and Mecab.
Wikipedia Parallel Titles
⭐
53
Tools for extracting parallel corpora from article titles across languages in Wikipedia
Word2vec On Wikipedia
⭐
39
A pipeline for training word embeddings using word2vec on wikipedia corpus.
Mitie_chinese_wikipedia_corpus
⭐
35
Pre-trained Wikipedia corpus by MITIE
Chinese Wikipedia Corpus Creator
⭐
33
Corpus creator for Chinese Wikipedia
News Media Reliability
⭐
32
Primer_quicksilver
⭐
31
A pipeline for detecting novel information about entities from a stream of text, updating a knowledge base about the entities, and generating natural language summaries.
Odia Nlp Resource Catalog
⭐
26
Ml You Can Use
⭐
24
Practical ML and NLP with examples.
Corpusportugues
⭐
24
Corpus do Idioma Português e Modelos
Un General Debates
⭐
20
Analysis and experiments on the UN General Debate corpus
Wikicorpusextractor
⭐
19
Extracts text from WikiMedia XML Dump files
Topic Modelling On Wiki Corpus
⭐
19
It uses Latent Dirichlet Allocation algorithm to discover hidden topics from the articles. It is trained on 60,000 articles taken from simple wikipedia english corpus. Finally, It can extract the topic of the given input text article.
Politbert
⭐
18
Polish RoBERTA model trained on Polish literature, Wikipedia, and Oscar. The major assumption is that quality text will give a good model.
Workbench
⭐
17
Java and Lucene based tools for BitFunnel corpus preparation
Wikipedia Stats Extractor
⭐
16
Raw Wikipedia counts for entity linking
Korkai
⭐
16
A corpus builder for Tamil by analyzing wordpress, blogger, wikipedia dumps
Wikibeagle
⭐
15
Computes semantic vectors via the BEAGLE algorithm of Jones & Mewhort (2007; plus word-form representation, inspired by Cox, Kachergis, Recchia & Jones, 2011) using random wikipedia pages as a corpus.
German Wikipedia Text Corpus
⭐
15
This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings like fastText or ELMo Deep contextualized word representations.
Pretrained_doc2vec_ja
⭐
15
Wikipedia Search Engine
⭐
15
Involves building a search engine on the Wikipedia Data Dump using the data dump of 2013 of size 43 GB. The search results returns in real time.
Namuwikitext
⭐
15
Wikitext format dataset of Namuwiki (Most famous Korean wikipedia)
Corpus
⭐
14
Malayalam Corpus by Swathanthra Malayalam Computing
Hanzifreq
⭐
13
Chinese Character Frequencies
Elang
⭐
13
Word Embedding utilities for Language Models (English & Indonesian)
German2vec
⭐
13
Language Model and Text Classification for German Language using Deep Learning
Opiec
⭐
12
Reading the data from OPIEC - an Open Information Extraction corpus
Bilingualcorpus
⭐
11
Pywordfreq
⭐
10
Word frequency checker based on Wikipedia corpus written in Rust
Subs2vec
⭐
10
Tools for training and evaluating word embeddings based on subtitles
Wiki Dump Reader
⭐
10
Extract corpora from Wikipedia dumps
Vecdcs
⭐
9
Koshik
⭐
9
An NLP framework for large scale processing using Hadoop
Entitypedia
⭐
9
Entitypedia is an Extended Named Entity Dictionary from Wikipedia.
Seedling
⭐
8
Building and Using A Seed Corpus for the Human Language Project
Expanda
⭐
8
The universal integrated corpus-building environment.
Scholar
⭐
8
Simple interface for Word2Vec in Python
Testreduce
⭐
8
Distributed testing server originally developed for Parsoid round-trip testing on a large corpus of Wikipedia pages
Wikipedia2corpus
⭐
7
Wikipedia text corpus for self-supervised NLP model training
Opiec Pipeline
⭐
7
Deepitalian
⭐
7
Wikiwords
⭐
7
Word frequencies in the 2012 English Wikipedia
Poio Corpus
⭐
7
The Poio Corpus is a freely available collection of language resources for the lesser-used languages. The data is extracted from free sources like Wikipedia, dictionaries, documents, websites and others.
Tf Similar Sentences
⭐
6
Find similar sentences using Tensorflow Hub for English Wikipedia
Pioner
⭐
6
Named-entity datasets and GloVe models for the Armenian language
Boyd Wnut2018
⭐
6
Code and data for: Low Resource Grammatical Error Correction Using Wikipedia Edits (WNUT 2018)
Jp Ner
⭐
5
[abandoned] Work on generating an NER dataset for Japanese
Nepali Nlp Resources
⭐
5
Resources for Nepali Natural Language Processing
Wcb
⭐
5
Wikipedia Corpus Builder
Related Searches
Python Corpus (2,447)
Python Wikipedia (1,264)
Javascript Wikipedia (817)
Article Wikipedia (581)
Natural Language Processing Corpus (510)
Dataset Corpus (342)
Java Wikipedia (321)
Java Corpus (308)
Language Corpus (261)
1-66 of 66 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.