Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python emr
emr
x
python
x
72 search results found
Aws Sdk Pandas
⭐
3,779
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Mrjob
⭐
2,584
Run MapReduce jobs on Hadoop or Amazon Web Services
Dataset Examples
⭐
926
Samples for users of the Yelp Academic Dataset
Goodreads_etl_pipeline
⭐
593
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Sagemaker Spark
⭐
285
A Spark library for Amazon SageMaker.
Beginner_de_project
⭐
276
Beginner data engineering project - batch edition
Vertical Medical
⭐
250
Open Source Healthcare System for Odoo
Health
⭐
205
Open Source Health Information System
Emr Serverless Samples
⭐
124
Example code for running Spark and Hive jobs on EMR Serverless.
Spark Knn Recommender
⭐
113
Item and User-based KNN recommendation algorithms using PySpark
Briefly
⭐
85
Briefly - A Python Meta-programming Library for Job Flow Control
Rail
⭐
70
Scalable RNA-seq analysis
Sparksteps
⭐
68
⭐ CLI tool to launch Spark jobs on AWS EMR
Aws Concurrent Data Orchestration Pipeline Emr Livy
⭐
66
This code demonstrates the architecture featured on the AWS Big Data blog (https://aws.amazon.com/blogs/big-data/ ) which creates a concurrent data pipeline by using Amazon EMR and Apache Livy. This pipeline is orchestrated by Apache Airflow.
Deeplearning Emr
⭐
58
Scripts and instructions to facilitate running Deep Learning Tasks on Amazon EMR
Aws Emr Launch
⭐
49
Emr Bootstrap Spark
⭐
49
AWS bootstrap scripts for Mozilla's flavoured Spark setup.
Terraform Emr Pyspark
⭐
46
Quickstart PySpark with Anaconda on AWS/EMR using Terraform
Emr Bootstrap Pyspark
⭐
43
Quickstart PySpark with Anaconda on AWS/EMR
Edc Mod1 Exercise Igti
⭐
42
Exercícios do módulo 1 - Bootcamp EDC - IGTI 2021
Telemetry Analysis Service
⭐
33
Telemetry Analysis Service
Webarchive Indexing
⭐
30
Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.
Emrio
⭐
30
Elastic MapReduce instance optimizer
Aws Auto Terminate Idle Emr
⭐
26
AWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
Knn
⭐
23
Spark Knn Recommender
Aws Emr Apache Ranger
⭐
20
Localemr
⭐
20
Local AWS EMR - A local service that imitates AWS EMR
M3d Api
⭐
20
Metadata Driven Development (m3d) is a cloud and platform agnostic framework for the automated creation, management and governance of data lakes.
990_long
⭐
19
Generates a long-form version of every field in the IRS 990 e-file dataset based on the NOPDC "Datathon" concordance
Spark And Mllib Projects
⭐
18
This repository contains Spark, MLlib, PySpark and Dataframes projects
Tscharts
⭐
17
Django REST framework-based Digital Patient Registration and EMR backend
Hippolyte
⭐
17
Tool to automate DynamoDB backups.
Cs205_ga
⭐
16
How deep does Google Analytics go? Efficiently tackling Common Crawl using AWS & MapReduce
Pyspark Emr
⭐
15
A toolset to streamline running spark python on EMR
Gptools For Aws
⭐
15
GP Tools for Amazon Web Services Elastic Map Reduce (Hosted Hadoop Framework)
Quickfabric
⭐
15
A one-stop shop for all management and monitoring of Amazon Elastic Map Reduce (EMR) clusters across different AWS accounts and purposes.
Emr Cost Calculator
⭐
13
EMR Cost Calculator
Pyspark S3 Parquet Example
⭐
13
This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apache Spark instance running on AWS EMR, which will run a SQLContext to create a temporary table using a DataFrame. SQL queries will then be possible against the temporary table.
Libsvm Hadoop
⭐
12
Dmnc
⭐
12
Dual Memory Neural Computer
Pytest Stepfunctions
⭐
12
A pytest fixture that makes you able to mock Lambda code during AWS StepFunctions local testing
Nyc_taxi_pipeline
⭐
12
Design/Implement stream/batch architecture on NYC taxi data | #DE
Awsflow
⭐
11
AWSFlow: Amazon EMR jobs and Lambda functions with Python
Dataplate
⭐
10
Emr Demo
⭐
10
Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.
Aws Cdk Emr S3 Trigger
⭐
10
Chicago Taxi Trips Analysis
⭐
10
Analysis of City Of Chicago Taxi Trip Dataset Using AWS EMR, Spark, PySpark, Zeppelin and Airbnb's Superset
Tilebrute
⭐
10
Generate map tiles with Hadoop
Communitydetection Spark Aws
⭐
9
A Spark application, written in Python, to figure out strongly connected components with Bi-directional Label Propagation algorithm. This project implemented an 1.3GB Twitter network dataset on AWS EMR cluster.
Airflow_aws_utils
⭐
9
A collection of airflow sample workflows for data processing on aws
Agora Proc
⭐
8
Agora is a batch analyzer of video stream logs. It heavily uses Amazon Web Services and is built using the open-source mrjob python module. It leverages AWS EMR to parallelize the processing of client video player events.
Common_crawl
⭐
8
Simple Python MapReduce jobs for processing the Common Crawl plus command-line utilities
Sparksnake
⭐
8
Improving the development of Spark applications deployed as jobs on AWS services like Glue and EMR
Emr
⭐
8
Elastic Map Reduce Samples
Hail On Aws Spot Instances
⭐
7
An option to spin cost effective EMR clusters in AWS with Hail and JupyterNotebook installed
Hephaestus
⭐
7
🌠 Hephaestus - ETL and ML tools for OHDSI - OMOP CDM
Airflow Pyspark Emr
⭐
7
This project demonstrate how to process data stored in a data lake fashion, transforming it into an OLAP optimized structure by using PySpark. The PySpark Job runs on AWS EMR, and the Data Pipeline is orchestrated by Apache Airflow, including the infrastructure creation and the EMR cluster termination.
Deldash
⭐
7
Winning project in GE Precision Health Challenge
Emr
⭐
7
Code for Episodic Memory Reader (EMR) https://arxiv.org/abs/1903.06164
Sparkov
⭐
6
Markov Chain based fraud detection system in Spark.
Slippin Jimmy
⭐
6
Utils to build and deploy oozie workflows on AWS EMR
Csds Spark Emr
⭐
6
A simple Word Count Example using pyspark on AWS EMR
Ocemr
⭐
6
Open Clinic Electronic Medical Records
Diagnosisextraction_ml
⭐
6
Pipeline for building Machine Learning Classifiers for the diagnosis of EHR text-data. We used this pipeline for our study, published here: https://doi.org/10.2196/23930.
Distcomputing
⭐
6
Freshjobspipeline
⭐
5
ETL Pipeline using Spark, Airflow, & EMR
Awsutils
⭐
5
S3 and EMR utilities in python using boto3
Hadoop Mrutils
⭐
5
Utility/starter/example scripts to get started with Hadoop MapReduce
Airflow Dags
⭐
5
Flint
⭐
5
Main repository of the Flint project for Spark and Amazon EMR.
Ddapp
⭐
5
FULL stack data science project (tech currently utilized: AWS/boto3/EMR/EC2/S3, Python, PySpark (Spark SQL and MLlib), and Flask/Flask RESTPlus)
Aws Emr Basketball Tool
⭐
5
This software compares the output and running time of the EMR cluster. The architecture has been designed that users can throw(like shooting) a request, and receive a result(like score) by email, as if they were playing a basketball game.
Udacity Data Engineering Capstone
⭐
5
Capstone Project for Udacity Data Engineering Nanodegree
Geotweet
⭐
5
Store Twitter Streaming API output into Amazon S3 Buckets and process with EMR
Related Searches
Python Django (28,897)
Python Machine Learning (20,195)
Python Flask (17,643)
Python Script (17,004)
Python Dataset (14,792)
Python Docker (14,113)
Python Tensorflow (13,736)
Python Command Line (13,351)
Python Deep Learning (13,092)
Python Jupyter Notebook (12,976)
1-72 of 72 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.