Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Commoncrawl Crawler | 208 | a year ago | gpl-3.0 | Java | ||||||
The Common Crawl Crawler Engine and Related MapReduce code (2008-2012) | ||||||||||
Elasticrawl | 50 | 1 | 7 years ago | 10 | February 15, 2017 | 1 | mit | Ruby | ||
Launch AWS Elastic MapReduce jobs that process Common Crawl data. | ||||||||||
Common_crawl_types | 28 | 12 years ago | Ruby | |||||||
A simple Ruby example of how to process Common Crawl files using Elastic MapReduce | ||||||||||
Real_time_social_media_mining | 24 | 5 months ago | 21 | mit | HTML | |||||
DevOps pipeline for Real Time Social/Web Mining | ||||||||||
Cs205_ga | 16 | 10 years ago | Python | |||||||
How deep does Google Analytics go? Efficiently tackling Common Crawl using AWS & MapReduce | ||||||||||
Googleplay Web Crawler | 15 | 7 years ago | Java | |||||||
Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive | ||||||||||
Warc Mapreduce | 11 | 9 years ago | Java | |||||||
warc and wet support for Hadoop's mapreduce api | ||||||||||
Cc Mrjob | 9 | 8 years ago | 8 | mit | Python | |||||
Demonstration of using Python to process the Common Crawl dataset with the mrjob framework | ||||||||||
Common_crawl | 8 | 9 years ago | mit | Python | ||||||
Simple Python MapReduce jobs for processing the Common Crawl plus command-line utilities | ||||||||||
Inforetrieval | 8 | 5 years ago | mit | HTML | ||||||
Inverted Indexer, web crawler, sort, search and poster steamer written using Python for information retrieval. |