Awesome Open Source

Programming Languages

Search results for amazon web services spark

amazon-web-services x

4 search results found

Data Science Ipython Notebooks ⭐ 25,668

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Dev Setup ⭐ 5,802

macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.

Seldon Server ⭐ 1,420

Machine Learning Platform and Recommendation Engine built on Kubernetes

Aws Glue Samples ⭐ 1,334

AWS Glue code samples

Devops Python Tools ⭐ 709

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Flintrock ⭐ 627

A command-line tool for launching Apache Spark clusters.

Aws Glue Libs ⭐ 568

AWS Glue Libraries are additions and enhancements to Spark for ETL operations.

Data Engineering Interview Questions ⭐ 554

More than 2000+ Data engineer interview questions.

Spark Redshift ⭐ 514

Redshift data source for Apache Spark

Agile_data_code_2 ⭐ 435

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

Data Engineering Projects ⭐ 322

Personal Data Engineering Projects

Sagemaker Spark ⭐ 285

A Spark library for Amazon SageMaker.

Cc Pyspark ⭐ 280

Process Common Crawl data with Python and Spark

Beginner_de_project ⭐ 276

Beginner data engineering project - batch edition

Spark Jupyter Aws ⭐ 255

A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support

Installations_mac_ubuntu_windows ⭐ 233

Installations for Data Science. Anaconda, RStudio, Spark, TensorFlow, AWS (Amazon Web Services).

Aws Glue Data Catalog Client For Apache Hive Metastore ⭐ 184

The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog as a central repository to store structural and operational metadata for their data. AWS Glue provides out-of-box integration with Amazon EMR that enables customers to use the AWS Glue Data Catalog as an external Hive Metastore. This is an open-source implementation of the Apache Hive Metastore client on Amazon EMR clusters that uses the AWS Glue Data Catalog

Learning Hadoop And Spark ⭐ 160

Companion to Learning Hadoop and Learning Spark courses on Linked In Learning

Spark On Lambda ⭐ 144

Apache Spark on AWS Lambda

Kinesis Sql ⭐ 135

Kinesis Connector for Structured Streaming

Emr Serverless Samples ⭐ 124

Example code for running Spark and Hive jobs on EMR Serverless.

Variantspark ⭐ 121

machine learning for genomic variants

Spark Dynamodb ⭐ 108

Plug-and-play implementation of an Apache Spark custom data source for AWS DynamoDB.

Distributed Dataset ⭐ 107

A distributed data processing framework in Haskell.

Nd027 C3 Data Lakes With Spark ⭐ 99

Spark Streaming Example Project ⭐ 95

A Spark Streaming job reading events from Amazon Kinesis and writing event counts to DynamoDB

Spark_python_ml_examples ⭐ 81

Spark 2.0 Python Machine Learning examples

Data Engineering Nanodegree ⭐ 76

Projects done in the Data Engineering Nanodegree by Udacity.com

Spark_scala_ml_examples ⭐ 75

Spark 2.0 Scala Machine Learning examples

Luigi Warehouse ⭐ 73

A luigi powered analytics / warehouse stack

Sparksteps ⭐ 68

⭐ CLI tool to launch Spark jobs on AWS EMR

Terraform Aws Emr Cluster ⭐ 67

Terraform module to provision an Elastic MapReduce (EMR) cluster on AWS

Code supporting Data Science articles at The Marketing Technologist, Floryn Tech Blog, and Pythom.nl

Sbt Lighter ⭐ 55

SBT plugin for Apache Spark on AWS EMR

Spark Install ⭐ 50

Installation guide for Apache Spark + Hadoop on Mac/Linux

Emr Bootstrap Spark ⭐ 49

AWS bootstrap scripts for Mozilla's flavoured Spark setup.

Lambda Spark Executor ⭐ 44

Apache Spark AWS Lambda Executor (SAMBA)

EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Google Cloud Platform, AWS, Kubernetes, Databases, SFTP servers, On-Prem Systems and more.

Edc Mod1 Exercise Igti ⭐ 42

Exercícios do módulo 1 - Bootcamp EDC - IGTI 2021

Udacity Data Engineering ⭐ 42

Udacity Data Engineering Nano Degree (DEND)

Roffildlibrary ⭐ 40

Library for MQL5 (MetaTrader) with Python, Java, Apache Spark, AWS

Sageworks ⭐ 36

SageWorks: An easy to use Python API for creating and deploying SageMaker Models

Spark Social Science ⭐ 36

Automated Spark Cluster Builds with RStudio or PySpark for Policy Research

Codeigniter Amazon Sdk ⭐ 35

[Closed/Dead Project] this is a codeigniter library which you can use to deal with amazon sdk , it depends on the official amazon sdk files , you can also use it as a spark library

Mastering Machine Learning On Aws ⭐ 35

Mastering Machine Learning on AWS, published by Packt

Telemetry Analysis Service ⭐ 33

Telemetry Analysis Service

Write data & AI pipelines in (SQL, Spark, Pandas) and deploy to the cloud, simplified

Spark Bdp Stockticker Demo ⭐ 25

A Python project that demonstrates Apache Spark on the Basho Data Platform

Steam_recommendation_system ⭐ 25

Recommendation System, Collaborative Filtering, Spark, Hive, Flask, Web Crawler, AWS EC2, AWS RDS

Datapull ⭐ 23

Cloud based Data Platform based on Apache Spark

Cloudtik ⭐ 23

Cloud Scale Platform for Distributed Analytics and AI

Jobanalytics_and_search ⭐ 22

JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.

Aws Glue Docker ⭐ 22

🐋 Docker image for AWS Glue Spark/Python

Kafka Connect Msk Demo ⭐ 21

For a series of posts on Amazon MSK, Amazon EKS, and Amazon EMR

Terraglue ⭐ 21

Providing an easy way to deploy a Glue job in any AWS account using Terraform

This repo is for building Docker containers for RStudio, PostgreSQL, Hadoop, Spark, etc.

Snowplow Scala Analytics Sdk ⭐ 20

Scala SDK for working with Snowplow enriched events in Spark, AWS Lambda, Flink et al.

Spark Athena ⭐ 19

AWS Athena data source for Apache Spark

Spark Scala Eks ⭐ 19

Spark Scala docker container sample for AWS testing - EKS & S3

Covid 19 Data Engineering Pipeline ⭐ 19

A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.

Spark And Mllib Projects ⭐ 18

This repository contains Spark, MLlib, PySpark and Dataframes projects

Data Pipeline Project ⭐ 18

Data pipeline project

Starting Bigdata Aws ⭐ 16

Snowplow Python Analytics Sdk ⭐ 16

Python SDK for working with Snowplow enriched events in Spark, AWS Lambda et al.

Rheoceros ⭐ 15

Cloud-based AI / ML workflow and data application development framework

Experiments ⭐ 15

Code examples for my blog posts

Aws Emr Examples ⭐ 14

Some AWS EMR examples

Bootcamp_data Engineering ⭐ 13

Bootcamp to learn basics in Data Engineering

Materials for Oxford Software Engineering Programme CLO course

Pyspark On Aws Emr ⭐ 13

The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.

Pyspark S3 Parquet Example ⭐ 13

This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apache Spark instance running on AWS EMR, which will run a SQLContext to create a temporary table using a DataFrame. SQL queries will then be possible against the temporary table.

Spark Docker Swarm ⭐ 11

Spark on Docker Swarm example code

Cimspark ⭐ 11

Spark access to Common Information Model (CIM) files

Emr Demo ⭐ 10

Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.

Sparknotebook ⭐ 10

A fast way of getting a Spark cluster up and running on AWS with the friendly IPython interface.

Local Aws Spark Zeppelin Stack ⭐ 9

AWS LocalStack + Spark Cluster + Zeppelin [Docker]

Scalable Hashtag Recommender System ⭐ 9

Scalable hashtag recommender system based on mini-batch fast k-means and neural networks

Communitydetection Spark Aws ⭐ 9

A Spark application, written in Python, to figure out strongly connected components with Bi-directional Label Propagation algorithm. This project implemented an 1.3GB Twitter network dataset on AWS EMR cluster.

Packer Aws Spark ⭐ 8

Packer Template to build a AWS Apache Spark AMI

Cdk Emrserverless With Delta Lake ⭐ 8

This construct builds some elements for you to quickly launch an EMR Serverless application. After submitting the Emr Serverless job, you could also launch an EMR notebook via cluster template to check the outcome from the EMR Serverless application.

Insight Zone Defense ⭐ 8

One-click automation of big data pipeline with monitoring

Sparksnake ⭐ 8

Improving the development of Spark applications deployed as jobs on AWS services like Glue and EMR

Cassandra Gdelt Queries ⭐ 8

A Cassandra Architecture for GDELT Database 🌍

Cars Forge ⭐ 8

EC2 fleets made easy

Terraform Module Aws Spark ⭐ 8

Terraform Module to create a Apache Spark cluster on AWS

Data Engineering Onboarding Starter ⭐ 8

This repository contains a 10 step program to enter the world of Data Engineering

Structured Streaming Cassandra Sink ⭐ 7

An example of how to create and use Cassandra sink in Spark Structured Streaming application

🤓 Examples Advanced 🧐 Projects Akka 🚀 ZIO ⚡️ Algorithms 😼 Cats

This is an ETL application on AWS with general open sales and customer data that you can find here: https://github.com/camposvinicius/data/blob/main/A it's a zipped file with some .csvs inside that we will apply transformations.

MongoDB to Redshift data transfer using Apache Spark.

Emr Scripts ⭐ 6

Shell scripts for AWS EMR clusters

Distcomputing ⭐ 6

Dataengineering Youtube Project ⭐ 6

Data Engineering Youtube Project

Spark Glue Data Catalog ⭐ 6

Apache Spark build compatible with AWS Glue Data Catalog.

Spark Databricks ⭐ 6

🔥 Master Apache Spark & Databricks! Dive into a world of big data with exclusive insights from Udemy courses, personal notes, and practical guides. Whether you're starting out or scaling new heights in data engineering, this is your ultimate resource hub! 🌟🚀

Data Engineer Portfolio ⭐ 6

This is a repository to demonstrate my details, skills, projects and to keep track of my progression in Data Analytics and Data Science topics.

Spark Sessions ⭐ 6

Examples for how to split sets of time based events into sessions using Spark

Spark Eks ⭐ 5

Examples and custom spark images for working with the spark-on-k8s operator on AWS

Sparkling Water Emr ⭐ 5

Launch Sparkling Water on EMR

Geotrellis Workshop ⭐ 5

GeoTrellis Workshop Material

Related Searches

Python Amazon Web Services (8,120)

Amazon Web Services Lambda Functions (7,495)

Amazon Web Services Terraform (4,243)

Amazon Web Services Serverless (4,018)

Amazon Web Services Hcl (3,473)

Scala Spark (3,279)

Golang Amazon Web Services (2,930)

Docker Amazon Web Services (2,864)

Amazon Web Services Aws Lambda (2,670)

Amazon Web Services Cloudformation (2,431)

1-4 of 4 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.