Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for pyspark emr
emr
x
pyspark
x
21 search results found
Spark Nlp
⭐
3,578
State of the Art Natural Language Processing
Sagemaker Spark
⭐
285
A Spark library for Amazon SageMaker.
Spark Jupyter Aws
⭐
255
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Repo 2019
⭐
135
BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics
Spark Knn Recommender
⭐
113
Item and User-based KNN recommendation algorithms using PySpark
Terraform Emr Pyspark
⭐
46
Quickstart PySpark with Anaconda on AWS/EMR using Terraform
Emr Bootstrap Pyspark
⭐
43
Quickstart PySpark with Anaconda on AWS/EMR
Basin
⭐
29
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Spark And Mllib Projects
⭐
18
This repository contains Spark, MLlib, PySpark and Dataframes projects
Pyspark Emr
⭐
15
A toolset to streamline running spark python on EMR
Pyspark S3 Parquet Example
⭐
13
This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apache Spark instance running on AWS EMR, which will run a SQLContext to create a temporary table using a DataFrame. SQL queries will then be possible against the temporary table.
Nyc_taxi_pipeline
⭐
12
Design/Implement stream/batch architecture on NYC taxi data | #DE
Emr Demo
⭐
10
Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.
Chicago Taxi Trips Analysis
⭐
10
Analysis of City Of Chicago Taxi Trip Dataset Using AWS EMR, Spark, PySpark, Zeppelin and Airbnb's Superset
Sparksnake
⭐
8
Improving the development of Spark applications deployed as jobs on AWS services like Glue and EMR
Airflow Pyspark Emr
⭐
7
This project demonstrate how to process data stored in a data lake fashion, transforming it into an OLAP optimized structure by using PySpark. The PySpark Job runs on AWS EMR, and the Data Pipeline is orchestrated by Apache Airflow, including the infrastructure creation and the EMR cluster termination.
Aws Etl
⭐
7
This is an ETL application on AWS with general open sales and customer data that you can find here: https://github.com/camposvinicius/data/blob/main/A it's a zipped file with some .csvs inside that we will apply transformations.
Csds Spark Emr
⭐
6
A simple Word Count Example using pyspark on AWS EMR
Distcomputing
⭐
6
Datasprints Open Spaces
⭐
5
Repository for the code demoed in the talk
Ddapp
⭐
5
FULL stack data science project (tech currently utilized: AWS/boto3/EMR/EC2/S3, Python, PySpark (Spark SQL and MLlib), and Flask/Flask RESTPlus)
Related Searches
Spark Pyspark (773)
Python Pyspark (689)
Jupyter Notebook Pyspark (502)
1-21 of 21 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.