Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Trendingtopics | 351 | 13 years ago | 10 | Ruby | ||||||
Rails app for tracking trends in server logs - powered by the Cloudera Hadoop Distribution on EC2 | ||||||||||
Pignlproc | 160 | a year ago | 6 | Java | ||||||
Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps. | ||||||||||
Annotated Wikiextractor | 88 | 13 years ago | gpl-3.0 | Python | ||||||
Simple Wikipedia plain text extractor with article link annotations and Hadoop support. | ||||||||||
Wikihadoop | 84 | 11 years ago | 5 | Java | ||||||
Stream-based InputFormat for processing the compressed XML dumps of Wikipedia with Hadoop | ||||||||||
Textgrounder | 60 | 8 years ago | 1 | apache-2.0 | Scala | |||||
A system for connecting language to space and time. | ||||||||||
Wikireverse | 39 | 6 years ago | 2 | mit | Java | |||||
Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles. | ||||||||||
Wikipedia Ngrams | 12 | 11 years ago | Java | |||||||
Code to split/parse Wikipedia XML dump | ||||||||||
Hadoop_ctakes | 9 | 10 years ago | apache-2.0 | Java | ||||||
Hadoop integration code for working with with Apache cTAKES | ||||||||||
Subgraph Isomorphism | 9 | 4 years ago | gpl-3.0 | Java | ||||||
❄Implement the common subgraph isomorphism algorithms (i.e. Ullmann, VF2) based on MapReduce on Hadoop | ||||||||||
Koshik | 9 | 10 years ago | Java | |||||||
An NLP framework for large scale processing using Hadoop |