Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Petastorm | 1,693 | 8 | 6 months ago | 86 | February 03, 2023 | 174 | apache-2.0 | Python | ||
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code. | ||||||||||
Devops Python Tools | 709 | 4 months ago | 37 | mit | Python | |||||
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc. | ||||||||||
Nyc Transport | 144 | 7 years ago | bsd-3-clause | Jupyter Notebook | ||||||
A Unified Database of NYC transport (subway, taxi/Uber, and citibike) data. | ||||||||||
Pyspark S3 Parquet Example | 13 | 8 years ago | Python | |||||||
This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apache Spark instance running on AWS EMR, which will run a SQLContext to create a temporary table using a DataFrame. SQL queries will then be possible against the temporary table. | ||||||||||
Chicago Taxi Trips Analysis | 10 | 7 years ago | apache-2.0 | Python | ||||||
Analysis of City Of Chicago Taxi Trip Dataset Using AWS EMR, Spark, PySpark, Zeppelin and Airbnb's Superset | ||||||||||
Anaconda | 10 | 2 years ago | 1 | apache-2.0 | Python | |||||
python gift package | ||||||||||
Pyspark Dataframe Made Easy | 10 | 2 years ago | Jupyter Notebook | |||||||
pyspark dataframe made easy | ||||||||||
Microdrill | 7 | 8 years ago | 3 | March 01, 2016 | 1 | apache-2.0 | Python | |||
Simple Apache Drill alternative using PySpark | ||||||||||
Spark For Noobs By A Noob | 7 | 6 years ago | Jupyter Notebook | |||||||
Jupyter notebooks for learning PySpark | ||||||||||
Parquettohdf5 | 6 | 8 years ago | Python | |||||||
pyspark code for exporting parquet to hdf5 files (one per executor) then merge them into one large hdf5 file |