Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Ibis | 3,404 | 24 | 29 | 4 months ago | 68 | December 10, 2023 | 157 | apache-2.0 | Python | |
The flexibility of Python with the scale and performance of modern SQL. | ||||||||||
Devops Python Tools | 709 | 4 months ago | 37 | mit | Python | |||||
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc. | ||||||||||
Sagemaker Spark | 285 | 2 | 8 months ago | 36 | August 26, 2022 | 34 | apache-2.0 | Scala | ||
A Spark library for Amazon SageMaker. | ||||||||||
Spark Jupyter Aws | 255 | 7 years ago | 2 | Jupyter Notebook | ||||||
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support | ||||||||||
Aut | 128 | 10 months ago | 27 | November 17, 2022 | 3 | apache-2.0 | Scala | |||
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives. | ||||||||||
Spark With Python | 98 | 4 years ago | mit | Jupyter Notebook | ||||||
Fundamentals of Spark with Python (using PySpark), code examples | ||||||||||
Apachespark | 59 | 2 years ago | Python | |||||||
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies. | ||||||||||
Big_data | 55 | 4 months ago | mit | Jupyter Notebook | ||||||
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark. | ||||||||||
Spark Training | 52 | 3 years ago | 3 | Jupyter Notebook | ||||||
Repository used for Spark Trainings | ||||||||||
Datapipelines Essentials Python | 45 | a year ago | 1 | apache-2.0 | Python | |||||
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations |