Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for amazon web services emr
amazon-web-services
x
emr
x
6 search results found
Aws Sdk Pandas
⭐
3,779
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Mrjob
⭐
2,584
Run MapReduce jobs on Hadoop or Amazon Web Services
Hudi Resources
⭐
509
汇总Apache Hudi相关资料
Sagemaker Spark
⭐
285
A Spark library for Amazon SageMaker.
Beginner_de_project
⭐
276
Beginner data engineering project - batch edition
Spark Jupyter Aws
⭐
255
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Aws Glue Data Catalog Client For Apache Hive Metastore
⭐
184
The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog as a central repository to store structural and operational metadata for their data. AWS Glue provides out-of-box integration with Amazon EMR that enables customers to use the AWS Glue Data Catalog as an external Hive Metastore. This is an open-source implementation of the Apache Hive Metastore client on Amazon EMR clusters that uses the AWS Glue Data Catalog
Hello Aws Data Services
⭐
171
AWS Data/MLServices sample code & notes for my LinkedIn Learning courses
Learning Hadoop And Spark
⭐
160
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Repo 2019
⭐
135
BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics
Emr Serverless Samples
⭐
124
Example code for running Spark and Hive jobs on EMR Serverless.
Variantspark
⭐
121
machine learning for genomic variants
Sensu Plugins Aws
⭐
75
This plugin provides native AWS instrumentation for monitoring and metrics collection, including: health and metrics for various AWS services, such as EC2, RDS, ELB, and more, as well as handlers for EC2, SES, and SNS.
Spark_scala_ml_examples
⭐
75
Spark 2.0 Scala Machine Learning examples
Rail
⭐
70
Scalable RNA-seq analysis
Sparksteps
⭐
68
⭐ CLI tool to launch Spark jobs on AWS EMR
Terraform Aws Emr Cluster
⭐
67
Terraform module to provision an Elastic MapReduce (EMR) cluster on AWS
Aws Concurrent Data Orchestration Pipeline Emr Livy
⭐
66
This code demonstrates the architecture featured on the AWS Big Data blog (https://aws.amazon.com/blogs/big-data/ ) which creates a concurrent data pipeline by using Amazon EMR and Apache Livy. This pipeline is orchestrated by Apache Airflow.
Sbt Lighter
⭐
55
SBT plugin for Apache Spark on AWS EMR
Elasticrawl
⭐
50
Launch AWS Elastic MapReduce jobs that process Common Crawl data.
Emr Bootstrap Spark
⭐
49
AWS bootstrap scripts for Mozilla's flavoured Spark setup.
Terraform Emr Pyspark
⭐
46
Quickstart PySpark with Anaconda on AWS/EMR using Terraform
Themis
⭐
45
Autoscaling EMR clusters and Kinesis streams on Amazon Web Services (AWS)
Emr Bootstrap Pyspark
⭐
43
Quickstart PySpark with Anaconda on AWS/EMR
Edc Mod1 Exercise Igti
⭐
42
Exercícios do módulo 1 - Bootcamp EDC - IGTI 2021
Csds Material
⭐
38
Course material for the Computer Systems for Data Science class at Columbia
Terraform Aws Emr Cluster
⭐
35
A Terraform module to create an Amazon Web Services (AWS) Elastic MapReduce (EMR) cluster.
Mastering Machine Learning On Aws
⭐
35
Mastering Machine Learning on AWS, published by Packt
Telemetry Analysis Service
⭐
33
Telemetry Analysis Service
Workshop
⭐
30
BigData-JAWS 勉強会/イベント
Aws Auto Terminate Idle Emr
⭐
26
AWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
Aws Tasks
⭐
25
ant tasks for amazon web services
Localemr
⭐
20
Local AWS EMR - A local service that imitates AWS EMR
M3d Api
⭐
20
Metadata Driven Development (m3d) is a cloud and platform agnostic framework for the automated creation, management and governance of data lakes.
Terraform Infra
⭐
20
Production Grade Terraform for Provisioning Infrastructure
Spark And Mllib Projects
⭐
18
This repository contains Spark, MLlib, PySpark and Dataframes projects
Starting Bigdata Aws
⭐
16
Quickfabric
⭐
15
A one-stop shop for all management and monitoring of Amazon Elastic Map Reduce (EMR) clusters across different AWS accounts and purposes.
Gptools For Aws
⭐
15
GP Tools for Amazon Web Services Elastic Map Reduce (Hosted Hadoop Framework)
Terraform Emr Training
⭐
15
Terraform script for launching multiple EMR clusters for training purposes.
Googleplay Web Crawler
⭐
15
Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive
Aws Emr Examples
⭐
14
Some AWS EMR examples
Emr Cost Calculator
⭐
13
EMR Cost Calculator
Pyspark S3 Parquet Example
⭐
13
This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apache Spark instance running on AWS EMR, which will run a SQLContext to create a temporary table using a DataFrame. SQL queries will then be possible against the temporary table.
Realtime Bushfire Alert With Apache Flink Cep
⭐
13
Code and documentation for the demonstration example of the real-time bushfire alerting with the Complex Event Processing (CEP) in Apache Flink on Amazon EMR and a simulated IoT sensor network as described on the AWS Big Data Blog: Real-time bushfire alerting with Complex Event Processing in Apache Flink on Amazon EMR and IoT sensor network.
Pytest Stepfunctions
⭐
12
A pytest fixture that makes you able to mock Lambda code during AWS StepFunctions local testing
Cwlogs S3
⭐
11
Gem to download CloudWatch logs and upload to S3 for processing.
Dynamodb Emr Exporter
⭐
10
Uses EMR clusters to export dynamoDB tables to S3 and generates import steps
Aws Cdk Emr S3 Trigger
⭐
10
Mastering Parallel Programming With R
⭐
10
Code file for Mastering Parallel Programming with R by Packt Publishing
Install Emr
⭐
10
Installation script and instructions for setting up Tessera environment on Amazon Elastic MapReduce
Emr Studio Samples
⭐
10
This repo contains samples for EMR Studio feature.
Emr Demo
⭐
10
Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.
Communitydetection Spark Aws
⭐
9
A Spark application, written in Python, to figure out strongly connected components with Bi-directional Label Propagation algorithm. This project implemented an 1.3GB Twitter network dataset on AWS EMR cluster.
Airflow_aws_utils
⭐
9
A collection of airflow sample workflows for data processing on aws
Cassandra Gdelt Queries
⭐
8
A Cassandra Architecture for GDELT Database 🌍
Dynamodbdump
⭐
8
DynamoDB backups made easier (and cheaper)
Sparksnake
⭐
8
Improving the development of Spark applications deployed as jobs on AWS services like Glue and EMR
Aws Etl
⭐
7
This is an ETL application on AWS with general open sales and customer data that you can find here: https://github.com/camposvinicius/data/blob/main/A it's a zipped file with some .csvs inside that we will apply transformations.
Wadal
⭐
7
AWS Transient EMR Cluster Tool
Iac Terraform Emr
⭐
6
AWS Summit 2022 ASEAN --- COM203 Using IaC with Terraform to provision Big Data Platform on Amazon EMR
Amazon Emr With Juicefs
⭐
6
This is a quick start of using JuiceFS as storage backend for Amazon EMR cluster.
Aws Big Data Study
⭐
6
Study Guide for AWS Big Data Speciality Certification
Coheel
⭐
6
A library for the automatic detection and disambiguation of knowledge base entity mentions in texts.
Awesome Ec2 Spot
⭐
6
A curated list of awesome AWS EC2 Spot related updates, open source repos, guides, blogs, and other resources.
Spark Sessions
⭐
6
Examples for how to split sets of time based events into sessions using Spark
Emr Scripts
⭐
6
Shell scripts for AWS EMR clusters
Distcomputing
⭐
6
Harvard Data Tools
⭐
6
Csds Spark Emr
⭐
6
A simple Word Count Example using pyspark on AWS EMR
Emr Hail
⭐
5
A project to bootstrap EMR clusters with Hail installed
Spark_r_ml_examples
⭐
5
Spark 2.0 R/SparkR Machine Learning examples
Lorkbong
⭐
5
Throwaway demo of heroku + wukong + emr
Nutchpighive
⭐
5
crawl GooglePlay data with Nutch, ETL with Pig, analyze with Hive
Sparkling Water Emr
⭐
5
Launch Sparkling Water on EMR
Related Searches
Python Amazon Web Services (8,120)
Amazon Web Services Lambda Functions (7,452)
Javascript Amazon Web Services (7,205)
Amazon Web Services Terraform (4,152)
Amazon Web Services Serverless (3,876)
Amazon Web Services Hcl (3,411)
Golang Amazon Web Services (2,930)
Shell Amazon Web Services (2,890)
Amazon Web Services Aws Lambda (2,628)
Docker Amazon Web Services (2,474)
1-6 of 6 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.