Cloud Computing Search Engine

A cloud-based web search engine computing Hadoop MapReduce on Amazon EC2 consisting of crawler, indexer, PageRank.
Alternatives To Cloud Computing Search Engine
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Commoncrawl Crawler208
a year agogpl-3.0Java
The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)
Elasticrawl50
17 years ago10February 15, 20171mitRuby
Launch AWS Elastic MapReduce jobs that process Common Crawl data.
Common_crawl_types28
12 years agoRuby
A simple Ruby example of how to process Common Crawl files using Elastic MapReduce
Real_time_social_media_mining24
5 months ago21mitHTML
DevOps pipeline for Real Time Social/Web Mining
Cs205_ga16
10 years agoPython
How deep does Google Analytics go? Efficiently tackling Common Crawl using AWS & MapReduce
Googleplay Web Crawler15
7 years agoJava
Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive
Warc Mapreduce11
9 years agoJava
warc and wet support for Hadoop's mapreduce api
Cc Mrjob9
8 years ago8mitPython
Demonstration of using Python to process the Common Crawl dataset with the mrjob framework
Common_crawl8
9 years agomitPython
Simple Python MapReduce jobs for processing the Common Crawl plus command-line utilities
Inforetrieval8
5 years agomitHTML
Inverted Indexer, web crawler, sort, search and poster steamer written using Python for information retrieval.
Alternatives To Cloud Computing Search Engine
Select To Compare


Alternative Project Comparisons
Popular Crawler Projects
Popular Mapreduce Projects
Popular Data Processing Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Java
Crawler
Hadoop
Search Engine
Mapreduce