Cs205_ga

How deep does Google Analytics go? Efficiently tackling Common Crawl using AWS & MapReduce
Alternatives To Cs205_ga
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Commoncrawl Crawler208
a year agogpl-3.0Java
The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)
Elasticrawl50
17 years ago10February 15, 20171mitRuby
Launch AWS Elastic MapReduce jobs that process Common Crawl data.
Common_crawl_types28
12 years agoRuby
A simple Ruby example of how to process Common Crawl files using Elastic MapReduce
Real_time_social_media_mining24
5 months ago21mitHTML
DevOps pipeline for Real Time Social/Web Mining
Cs205_ga16
10 years agoPython
How deep does Google Analytics go? Efficiently tackling Common Crawl using AWS & MapReduce
Googleplay Web Crawler15
7 years agoJava
Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive
Warc Mapreduce11
9 years agoJava
warc and wet support for Hadoop's mapreduce api
Cc Mrjob9
8 years ago8mitPython
Demonstration of using Python to process the Common Crawl dataset with the mrjob framework
Common_crawl8
9 years agomitPython
Simple Python MapReduce jobs for processing the Common Crawl plus command-line utilities
Inforetrieval8
5 years agomitHTML
Inverted Indexer, web crawler, sort, search and poster steamer written using Python for information retrieval.
Alternatives To Cs205_ga
Select To Compare


Alternative Project Comparisons
Popular Mapreduce Projects
Popular Crawler Projects
Popular Data Processing Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
Amazon
Crawler
Hadoop
Mapreduce