Project Name	Stars	Repos Using This	Most Recent Commit	Total Releases	Latest Release	Open Issues	License	Language
Commoncrawl Crawler	208		a year ago				gpl-3.0	Java
The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)
Elasticrawl	50	1	7 years ago	10	February 15, 2017	1	mit	Ruby
Launch AWS Elastic MapReduce jobs that process Common Crawl data.
Common_crawl_types	28		12 years ago					Ruby
A simple Ruby example of how to process Common Crawl files using Elastic MapReduce
Real_time_social_media_mining	24		5 months ago			21	mit	HTML
DevOps pipeline for Real Time Social/Web Mining
Cs205_ga	16		10 years ago					Python
How deep does Google Analytics go? Efficiently tackling Common Crawl using AWS & MapReduce
Googleplay Web Crawler	15		7 years ago					Java
Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive
Warc Mapreduce	11		9 years ago					Java
warc and wet support for Hadoop's mapreduce api
Cc Mrjob	9		8 years ago			8	mit	Python
Demonstration of using Python to process the Common Crawl dataset with the mrjob framework
Common_crawl	8		9 years ago				mit	Python
Simple Python MapReduce jobs for processing the Common Crawl plus command-line utilities
Inforetrieval	8		5 years ago				mit	HTML
Inverted Indexer, web crawler, sort, search and poster steamer written using Python for information retrieval.

Alternatives To Cloud Computing Search Engine

Select To Compare

Commoncrawl Crawler ⭐ 208

The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)

most recent commit a year ago

Elasticrawl ⭐ 50

Launch AWS Elastic MapReduce jobs that process Common Crawl data.

total releases 10most recent commit 7 years ago

Common_crawl_types ⭐ 28

A simple Ruby example of how to process Common Crawl files using Elastic MapReduce

most recent commit 12 years ago

Real_time_social_media_mining ⭐ 24

DevOps pipeline for Real Time Social/Web Mining

most recent commit 5 months ago

Cs205_ga ⭐ 16

How deep does Google Analytics go? Efficiently tackling Common Crawl using AWS & MapReduce

most recent commit 10 years ago

Googleplay Web Crawler ⭐ 15

Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive

most recent commit 7 years ago

Warc Mapreduce ⭐ 11

warc and wet support for Hadoop's mapreduce api

most recent commit 9 years ago

Cc Mrjob ⭐ 9

Demonstration of using Python to process the Common Crawl dataset with the mrjob framework

most recent commit 8 years ago

Common_crawl ⭐ 8

Simple Python MapReduce jobs for processing the Common Crawl plus command-line utilities

most recent commit 9 years ago

Inforetrieval ⭐ 8

Inverted Indexer, web crawler, sort, search and poster steamer written using Python for information retrieval.

most recent commit 5 years ago

Suggest An Alternative To cloud-computing-search-engine

Alternative Project Comparisons

Cloud Computing Search Engine vs Commoncrawl Crawler

Cloud Computing Search Engine vs Elasticrawl

Cloud Computing Search Engine vs Common_crawl_types

Cloud Computing Search Engine vs Real_time_social_media_mining

Cloud Computing Search Engine vs Cs205_ga

Cloud Computing Search Engine vs Googleplay Web Crawler

Cloud Computing Search Engine vs Warc Mapreduce

Cloud Computing Search Engine vs Cc Mrjob

Cloud Computing Search Engine vs Common_crawl

Cloud Computing Search Engine vs Inforetrieval

Popular Crawler Projects

Scrapy ⭐ 49,918

Scrapy, a fast high-level web crawling & scraping framework for Python.

dependent packages 445total releases 96latest release September 18, 2023most recent commit 3 months ago

Lux ⭐ 24,752

👾 Fast and simple video download library and CLI tool written in Go

dependent packages 8total releases 40latest release November 06, 2023most recent commit 17 days ago

Colly ⭐ 21,902

Elegant Scraper and Crawler Framework for Golang

dependent packages 328total releases 22latest release March 08, 2022most recent commit a month ago

Easyspider ⭐ 20,149

A visual no-code/code-free web crawler/spider易采集：一个可视化浏览器自动化测试/数据采集/爬虫软件，可以无代码图形化

most recent commit 16 days ago

Proxy_pool ⭐ 19,442

Python ProxyPool for web spider

most recent commit 3 months ago

Popular Mapreduce Projects

Data Science Ipython Notebooks ⭐ 25,668

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

most recent commit 6 months ago

Redisson ⭐ 22,647

Redisson - Easy Redis Java client with features of In-Memory Data Grid. Sync/Async/RxJava/Reactive API. Over 50 Redis based Java objects and services: Set, Multimap, SortedSet, Map, List, Queue, Deque, Semaphore, Lock, AtomicLong, Map Reduce, Bloom filter, Spring Cache, Tomcat, Scheduler, JCache API, Hibernate, RPC, local cache ...

dependent packages 358total releases 218latest release October 24, 2023most recent commit 10 days ago

Bigdata Notes ⭐ 14,872

大数据入门指南 :star:

most recent commit 3 months ago

Cookbook ⭐ 12,557

The Data Engineering Cookbook

most recent commit 3 months ago

Powerjob ⭐ 6,249

Enterprise job scheduling middleware with distributed computing ability.

dependent packages 5total releases 13latest release September 03, 2023most recent commit 3 months ago

Popular Data Processing Categories

Get A Weekly Email With Trending Projects For These Categories

No Spam. Unsubscribe easily at any time.

Java

Crawler

Hadoop

Search Engine

Mapreduce

Privacy | About | Terms | Follow Us On Twitter

Downloads, Dependent Repos, Dependent Packages, Total Releases, Latest Releases data powered by Libraries.io.