Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for amazon web services spark
amazon-web-services
x
spark
x
4 search results found
Data Science Ipython Notebooks
⭐
25,668
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Dev Setup
⭐
5,802
macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.
Seldon Server
⭐
1,420
Machine Learning Platform and Recommendation Engine built on Kubernetes
Aws Glue Samples
⭐
1,334
AWS Glue code samples
Devops Python Tools
⭐
709
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Flintrock
⭐
627
A command-line tool for launching Apache Spark clusters.
Aws Glue Libs
⭐
568
AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Data Engineering Interview Questions
⭐
554
More than 2000+ Data engineer interview questions.
Spark Redshift
⭐
514
Redshift data source for Apache Spark
Agile_data_code_2
⭐
435
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Data Engineering Projects
⭐
322
Personal Data Engineering Projects
Sagemaker Spark
⭐
285
A Spark library for Amazon SageMaker.
Cc Pyspark
⭐
280
Process Common Crawl data with Python and Spark
Beginner_de_project
⭐
276
Beginner data engineering project - batch edition
Spark Jupyter Aws
⭐
255
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Installations_mac_ubuntu_windows
⭐
233
Installations for Data Science. Anaconda, RStudio, Spark, TensorFlow, AWS (Amazon Web Services).
Aws Glue Data Catalog Client For Apache Hive Metastore
⭐
184
The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog as a central repository to store structural and operational metadata for their data. AWS Glue provides out-of-box integration with Amazon EMR that enables customers to use the AWS Glue Data Catalog as an external Hive Metastore. This is an open-source implementation of the Apache Hive Metastore client on Amazon EMR clusters that uses the AWS Glue Data Catalog
Learning Hadoop And Spark
⭐
160
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Spark On Lambda
⭐
144
Apache Spark on AWS Lambda
Kinesis Sql
⭐
135
Kinesis Connector for Structured Streaming
Emr Serverless Samples
⭐
124
Example code for running Spark and Hive jobs on EMR Serverless.
Variantspark
⭐
121
machine learning for genomic variants
Spark Dynamodb
⭐
108
Plug-and-play implementation of an Apache Spark custom data source for AWS DynamoDB.
Distributed Dataset
⭐
107
A distributed data processing framework in Haskell.
Nd027 C3 Data Lakes With Spark
⭐
99
Spark Streaming Example Project
⭐
95
A Spark Streaming job reading events from Amazon Kinesis and writing event counts to DynamoDB
Spark_python_ml_examples
⭐
81
Spark 2.0 Python Machine Learning examples
Data Engineering Nanodegree
⭐
76
Projects done in the Data Engineering Nanodegree by Udacity.com
Spark_scala_ml_examples
⭐
75
Spark 2.0 Scala Machine Learning examples
Luigi Warehouse
⭐
73
A luigi powered analytics / warehouse stack
Sparksteps
⭐
68
⭐ CLI tool to launch Spark jobs on AWS EMR
Terraform Aws Emr Cluster
⭐
67
Terraform module to provision an Elastic MapReduce (EMR) cluster on AWS
Pythom
⭐
64
Code supporting Data Science articles at The Marketing Technologist, Floryn Tech Blog, and Pythom.nl
Sbt Lighter
⭐
55
SBT plugin for Apache Spark on AWS EMR
Spark Install
⭐
50
Installation guide for Apache Spark + Hadoop on Mac/Linux
Emr Bootstrap Spark
⭐
49
AWS bootstrap scripts for Mozilla's flavoured Spark setup.
Lambda Spark Executor
⭐
44
Apache Spark AWS Lambda Executor (SAMBA)
Etlflow
⭐
43
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Google Cloud Platform, AWS, Kubernetes, Databases, SFTP servers, On-Prem Systems and more.
Edc Mod1 Exercise Igti
⭐
42
Exercícios do módulo 1 - Bootcamp EDC - IGTI 2021
Udacity Data Engineering
⭐
42
Udacity Data Engineering Nano Degree (DEND)
Roffildlibrary
⭐
40
Library for MQL5 (MetaTrader) with Python, Java, Apache Spark, AWS
Sageworks
⭐
36
SageWorks: An easy to use Python API for creating and deploying SageMaker Models
Spark Social Science
⭐
36
Automated Spark Cluster Builds with RStudio or PySpark for Policy Research
Codeigniter Amazon Sdk
⭐
35
[Closed/Dead Project] this is a codeigniter library which you can use to deal with amazon sdk , it depends on the official amazon sdk files , you can also use it as a spark library
Mastering Machine Learning On Aws
⭐
35
Mastering Machine Learning on AWS, published by Packt
Telemetry Analysis Service
⭐
33
Telemetry Analysis Service
Yaetos
⭐
32
Write data & AI pipelines in (SQL, Spark, Pandas) and deploy to the cloud, simplified
Spark Bdp Stockticker Demo
⭐
25
A Python project that demonstrates Apache Spark on the Basho Data Platform
Steam_recommendation_system
⭐
25
Recommendation System, Collaborative Filtering, Spark, Hive, Flask, Web Crawler, AWS EC2, AWS RDS
Datapull
⭐
23
Cloud based Data Platform based on Apache Spark
Cloudtik
⭐
23
Cloud Scale Platform for Distributed Analytics and AI
Jobanalytics_and_search
⭐
22
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Aws Glue Docker
⭐
22
🐋 Docker image for AWS Glue Spark/Python
Kafka Connect Msk Demo
⭐
21
For a series of posts on Amazon MSK, Amazon EKS, and Amazon EMR
Terraglue
⭐
21
Providing an easy way to deploy a Glue job in any AWS account using Terraform
Rspark
⭐
21
This repo is for building Docker containers for RStudio, PostgreSQL, Hadoop, Spark, etc.
Snowplow Scala Analytics Sdk
⭐
20
Scala SDK for working with Snowplow enriched events in Spark, AWS Lambda, Flink et al.
Spark Athena
⭐
19
AWS Athena data source for Apache Spark
Spark Scala Eks
⭐
19
Spark Scala docker container sample for AWS testing - EKS & S3
Covid 19 Data Engineering Pipeline
⭐
19
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.
Spark And Mllib Projects
⭐
18
This repository contains Spark, MLlib, PySpark and Dataframes projects
Data Pipeline Project
⭐
18
Data pipeline project
Starting Bigdata Aws
⭐
16
Snowplow Python Analytics Sdk
⭐
16
Python SDK for working with Snowplow enriched events in Spark, AWS Lambda et al.
Rheoceros
⭐
15
Cloud-based AI / ML workflow and data application development framework
Experiments
⭐
15
Code examples for my blog posts
Aws Emr Examples
⭐
14
Some AWS EMR examples
Bootcamp_data Engineering
⭐
13
Bootcamp to learn basics in Data Engineering
Ox Clo
⭐
13
Materials for Oxford Software Engineering Programme CLO course
Pyspark On Aws Emr
⭐
13
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
Pyspark S3 Parquet Example
⭐
13
This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apache Spark instance running on AWS EMR, which will run a SQLContext to create a temporary table using a DataFrame. SQL queries will then be possible against the temporary table.
Spark Docker Swarm
⭐
11
Spark on Docker Swarm example code
Cimspark
⭐
11
Spark access to Common Information Model (CIM) files
Emr Demo
⭐
10
Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.
Sparknotebook
⭐
10
A fast way of getting a Spark cluster up and running on AWS with the friendly IPython interface.
Local Aws Spark Zeppelin Stack
⭐
9
AWS LocalStack + Spark Cluster + Zeppelin [Docker]
Scalable Hashtag Recommender System
⭐
9
Scalable hashtag recommender system based on mini-batch fast k-means and neural networks
Communitydetection Spark Aws
⭐
9
A Spark application, written in Python, to figure out strongly connected components with Bi-directional Label Propagation algorithm. This project implemented an 1.3GB Twitter network dataset on AWS EMR cluster.
Packer Aws Spark
⭐
8
Packer Template to build a AWS Apache Spark AMI
Cdk Emrserverless With Delta Lake
⭐
8
This construct builds some elements for you to quickly launch an EMR Serverless application. After submitting the Emr Serverless job, you could also launch an EMR notebook via cluster template to check the outcome from the EMR Serverless application.
Insight Zone Defense
⭐
8
One-click automation of big data pipeline with monitoring
Sparksnake
⭐
8
Improving the development of Spark applications deployed as jobs on AWS services like Glue and EMR
Cassandra Gdelt Queries
⭐
8
A Cassandra Architecture for GDELT Database 🌍
Cars Forge
⭐
8
EC2 fleets made easy
Terraform Module Aws Spark
⭐
8
Terraform Module to create a Apache Spark cluster on AWS
Data Engineering Onboarding Starter
⭐
8
This repository contains a 10 step program to enter the world of Data Engineering
Structured Streaming Cassandra Sink
⭐
7
An example of how to create and use Cassandra sink in Spark Structured Streaming application
Scala
⭐
7
🤓 Examples Advanced 🧐 Projects Akka 🚀 ZIO ⚡️ Algorithms 😼 Cats
Aws Etl
⭐
7
This is an ETL application on AWS with general open sales and customer data that you can find here: https://github.com/camposvinicius/data/blob/main/A it's a zipped file with some .csvs inside that we will apply transformations.
Mshift
⭐
6
MongoDB to Redshift data transfer using Apache Spark.
Emr Scripts
⭐
6
Shell scripts for AWS EMR clusters
Distcomputing
⭐
6
Dataengineering Youtube Project
⭐
6
Data Engineering Youtube Project
Spark Glue Data Catalog
⭐
6
Apache Spark build compatible with AWS Glue Data Catalog.
Spark Databricks
⭐
6
🔥 Master Apache Spark & Databricks! Dive into a world of big data with exclusive insights from Udemy courses, personal notes, and practical guides. Whether you're starting out or scaling new heights in data engineering, this is your ultimate resource hub! 🌟🚀
Data Engineer Portfolio
⭐
6
This is a repository to demonstrate my details, skills, projects and to keep track of my progression in Data Analytics and Data Science topics.
Spark Sessions
⭐
6
Examples for how to split sets of time based events into sessions using Spark
Spark Eks
⭐
5
Examples and custom spark images for working with the spark-on-k8s operator on AWS
Sparkling Water Emr
⭐
5
Launch Sparkling Water on EMR
Geotrellis Workshop
⭐
5
GeoTrellis Workshop Material
Related Searches
Python Amazon Web Services (8,120)
Amazon Web Services Lambda Functions (7,495)
Amazon Web Services Terraform (4,243)
Amazon Web Services Serverless (4,018)
Amazon Web Services Hcl (3,473)
Scala Spark (3,279)
Golang Amazon Web Services (2,930)
Docker Amazon Web Services (2,864)
Amazon Web Services Aws Lambda (2,670)
Amazon Web Services Cloudformation (2,431)
1-4 of 4 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.