Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for amazon web services pyspark
amazon-web-services
x
pyspark
x
0 search results found
Hopsworks
⭐
1,041
Hopsworks - Data-Intensive AI platform with a Feature Store
Devops Python Tools
⭐
709
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Sagemaker Spark
⭐
285
A Spark library for Amazon SageMaker.
Cc Pyspark
⭐
280
Process Common Crawl data with Python and Spark
Spark Jupyter Aws
⭐
255
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Repo 2019
⭐
135
BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics
Spark_python_ml_examples
⭐
81
Spark 2.0 Python Machine Learning examples
Towardsdataengineering
⭐
52
This repo contains commands that data engineers use in day to day work.
Terraform Emr Pyspark
⭐
46
Quickstart PySpark with Anaconda on AWS/EMR using Terraform
Emr Bootstrap Pyspark
⭐
43
Quickstart PySpark with Anaconda on AWS/EMR
Spark Social Science
⭐
36
Automated Spark Cluster Builds with RStudio or PySpark for Policy Research
Jobanalytics_and_search
⭐
22
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Terraglue
⭐
21
Providing an easy way to deploy a Glue job in any AWS account using Terraform
Covid 19 Data Engineering Pipeline
⭐
19
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.
Spark And Mllib Projects
⭐
18
This repository contains Spark, MLlib, PySpark and Dataframes projects
Rheoceros
⭐
15
Cloud-based AI / ML workflow and data application development framework
Pyspark On Aws Emr
⭐
13
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
Pyspark S3 Parquet Example
⭐
13
This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apache Spark instance running on AWS EMR, which will run a SQLContext to create a temporary table using a DataFrame. SQL queries will then be possible against the temporary table.
Dot Connect
⭐
12
Improve your workflow efficiency by connecting to databases and cloud systems effortlessly.
Aws Glue Test Data Generator
⭐
12
AWS Glue Configurable Test Data Generator for S3 Data Lakes and DynamoDB
Emr Demo
⭐
10
Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.
Sparksnake
⭐
8
Improving the development of Spark applications deployed as jobs on AWS services like Glue and EMR
Aws Etl
⭐
7
This is an ETL application on AWS with general open sales and customer data that you can find here: https://github.com/camposvinicius/data/blob/main/A it's a zipped file with some .csvs inside that we will apply transformations.
Aws Glue Monorepo Style
⭐
7
Example of AWS Glue Jobs and workflow deployment with terraform in monorepo style. Code here supports the miniseries of articles about AWS Glue and python.
Distcomputing
⭐
6
Csds Spark Emr
⭐
6
A simple Word Count Example using pyspark on AWS EMR
Terraform Dlt Public
⭐
5
Deploying Delta Live Tables Pipelines on AWS with Terraform
1-0 of 0 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.