Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Flintrock | 627 | 4 | 5 months ago | 14 | November 27, 2023 | 36 | apache-2.0 | Python | ||
A command-line tool for launching Apache Spark clusters. | ||||||||||
Agile_data_code_2 | 435 | a year ago | 7 | mit | Jupyter Notebook | |||||
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition | ||||||||||
Spark Jupyter Aws | 255 | 6 years ago | 2 | Jupyter Notebook | ||||||
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support | ||||||||||
Learning Hadoop And Spark | 160 | 6 months ago | apache-2.0 | HTML | ||||||
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning | ||||||||||
Spark On Lambda | 144 | a year ago | 25 | apache-2.0 | Scala | |||||
Apache Spark on AWS Lambda | ||||||||||
Distributed Dataset | 107 | 4 years ago | 19 | bsd-3-clause | Haskell | |||||
A distributed data processing framework in Haskell. | ||||||||||
Mastering Machine Learning On Aws | 35 | a year ago | 1 | mit | Jupyter Notebook | |||||
Mastering Machine Learning on AWS, published by Packt | ||||||||||
Amazon Emr Cli | 26 | 3 months ago | 14 | apache-2.0 | Python | |||||
A command-line interface for packaging, deploying, and running your EMR Serverless Spark jobs | ||||||||||
Covid 19 Data Engineering Pipeline | 19 | 5 months ago | 5 | mit | Python | |||||
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions. | ||||||||||
Spark Athena | 19 | 7 years ago | apache-2.0 | Scala | ||||||
AWS Athena data source for Apache Spark |