Awesome Open Source

Programming Languages

Search results for amazon web services crawler

amazon-web-services x

0 search results found

Crawling Infrastructure ⭐ 321

Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.

Stopstalk Deployment ⭐ 306

Stop stalking and start StopStalking 😉

Cc Pyspark ⭐ 280

Process Common Crawl data with Python and Spark

Intoli Article Materials ⭐ 255

All of the supporting materials for articles from Intoli's blog.

A utility for crawling an AWS account and exporting all its resources for further analysis.

Serverlesscrawler Vancouverrealstate ⭐ 66

A Serverless Crawler For Real State Data in Vancouver Using AWS Lambda, Dynamo, RDS MySQL and CloudWatch

Serverless Web Differ ⭐ 60

A serverless web browser which crawls websites and compares pages by schedule.

Your internal mediocrity is the moment when you lost the faith of being excellent. Just do it.

Elasticrawl ⭐ 50

Launch AWS Elastic MapReduce jobs that process Common Crawl data.

Browser As A Service ⭐ 43

A web browser 🌎 hosted as a service, to render your JavaScript web pages as HTML

Wikireverse ⭐ 39

Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles.

Serverless Instagram Crawler ⭐ 29

serverless, instagram hashtag crawler with lambda, dynamoDB

Lambda Dynamic Prerenderer ⭐ 28

Dynamically prerender pages for bots and crawlers, with Lambda@Edge, S3 and CloudFront. No more need for isomorphic/server-side rendering!

Pokemongo Map Poc ⭐ 27

🎃 POC project for Pokemon Go map

Utsusemi ⭐ 27

A tool to generate a static website by crawling the original site.

Serverless Crawler Demo ⭐ 27

Serverless Architecture Crawler demo

Staticizer ⭐ 27

A tool to create a static version of a website for hosting on S3.

Steam_recommendation_system ⭐ 25

Recommendation System, Collaborative Filtering, Spark, Hive, Flask, Web Crawler, AWS EC2, AWS RDS

Pywren Workshops ⭐ 23

Various workshop labs that make use of pywren to massively process data in parallel with AWS Lambda

Nutch Aws ⭐ 23

Amazon S3 Step Functions Ingestion Orchestration ⭐ 19

Design pattern for orchestrating an incremental data ingestion pipeline using AWS Step Functions from an on premise location into an Amazon S3 datalake bucket

Damons Data Lake ⭐ 18

All the code related to building my own data lake

Cc Lambda ⭐ 16

Search the common crawl using lambda functions

Googleplay Web Crawler ⭐ 15

Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive

Utilities for managing AWS Glue/Athena tables and partitions stored in S3

Find websites with script URLs matching given regex

Aws Fargate Demo ⭐ 10

AWS fargate demo for AWSKRUG-recap

Building A Data Lake With Aws Glue And Amazon S3 ⭐ 10

Fragmenty ⭐ 9

an infrastructure for crawling, exposing api and visualizing Fragment.com/numbers data

Quicksightathena01 ⭐ 9

Amazon QuickSight and Amazon Athena workshop. Workshop will focus on ingesting data into Athena, combining it with other data sources, and visualizaing it in QuickSight.

Zmon Aws Agent ⭐ 9

AWS API crawler to auto discover running services in your account

Serverlessnycparkseventssitecrawler ⭐ 8

Frontendmasters Crawler ⭐ 8

A demo of a serverless crawler built on AWS Lambda (scheduled tasks) and store results in S3

Lastfm Scrobble Purger ⭐ 7

A tool for mass deleting last.fm scrobbles

Reinvent2018_aim416 ⭐ 6

AIM416 workshop material for AWS re:Invent 2018

Common Crawl Malayalam ⭐ 5

Useful tools to extract malayalam text from the Common Crawl Datasets

Nutchpighive ⭐ 5

crawl GooglePlay data with Nutch, ETL with Pig, analyze with Hive

Distributed Web Crawler With Celery ⭐ 5

Python: selenium, beautifulsoup2, celery, rabbitmq, Amazon AWS(EC2, S3)

Webcrawler ⭐ 5

A Recursive Web crawler built with Java 8, reactive streams, async queues and AWS DynamoDB.

1-0 of 0 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.