Wikihadoop

Stream-based InputFormat for processing the compressed XML dumps of Wikipedia with Hadoop
Alternatives To Wikihadoop
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Trendingtopics351
13 years ago10Ruby
Rails app for tracking trends in server logs - powered by the Cloudera Hadoop Distribution on EC2
Pignlproc160
a year ago6Java
Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.
Annotated Wikiextractor88
13 years agogpl-3.0Python
Simple Wikipedia plain text extractor with article link annotations and Hadoop support.
Wikihadoop84
11 years ago5Java
Stream-based InputFormat for processing the compressed XML dumps of Wikipedia with Hadoop
Textgrounder60
8 years ago1apache-2.0Scala
A system for connecting language to space and time.
Wikireverse39
6 years ago2mitJava
Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles.
Wikipedia Ngrams12
11 years agoJava
Code to split/parse Wikipedia XML dump
Hadoop_ctakes9
10 years agoapache-2.0Java
Hadoop integration code for working with with Apache cTAKES
Subgraph Isomorphism9
4 years agogpl-3.0Java
❄Implement the common subgraph isomorphism algorithms (i.e. Ullmann, VF2) based on MapReduce on Hadoop
Koshik9
10 years agoJava
An NLP framework for large scale processing using Hadoop
Alternatives To Wikihadoop
Select To Compare


Alternative Project Comparisons
Popular Hadoop Projects
Popular Wikipedia Projects
Popular Data Processing Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Java
Jar
Hadoop
Wikipedia