Crawling Infrastructure

Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.
Alternatives To Crawling Infrastructure
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Crawling Infrastructure321
2 years ago22agpl-3.0TypeScript
Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.
Stopstalk Deployment306
5 months ago92mitPython
Stop stalking and start StopStalking :wink:
Cc Pyspark280
a year ago4mitPython
Process Common Crawl data with Python and Spark
Intoli Article Materials255
a year ago85otherJavaScript
All of the supporting materials for articles from Intoli's blog.
Awsets1841a year ago35May 19, 20226mitGo
A utility for crawling an AWS account and exporting all its resources for further analysis.
Serverlesscrawler Vancouverrealstate66
7 years ago1mitPython
A Serverless Crawler For Real State Data in Vancouver Using AWS Lambda, Dynamo, RDS MySQL and CloudWatch
Serverless Web Differ60
2 years agomitPython
A serverless web browser which crawls websites and compares pages by schedule.
Blog59
3 months ago99SCSS
Your internal mediocrity is the moment when you lost the faith of being excellent. Just do it.
Elasticrawl50
17 years ago10February 15, 20171mitRuby
Launch AWS Elastic MapReduce jobs that process Common Crawl data.
Browser As A Service43
a year ago30mitJavaScript
A web browser :earth_americas: hosted as a service, to render your JavaScript web pages as HTML
Alternatives To Crawling Infrastructure
Select To Compare


Alternative Project Comparisons
Popular Crawler Projects
Popular Amazon Web Services Projects
Popular Data Processing Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Typescript
Amazon Web Services
Cloud Computing
Crawler
Puppeteer