Parquettohdf5

pyspark code for exporting parquet to hdf5 files (one per executor) then merge them into one large hdf5 file
Alternatives To Parquettohdf5
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Petastorm1,69386 months ago86February 03, 2023174apache-2.0Python
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Devops Python Tools709
4 months ago37mitPython
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Nyc Transport144
7 years agobsd-3-clauseJupyter Notebook
A Unified Database of NYC transport (subway, taxi/Uber, and citibike) data.
Pyspark S3 Parquet Example13
8 years agoPython
This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apache Spark instance running on AWS EMR, which will run a SQLContext to create a temporary table using a DataFrame. SQL queries will then be possible against the temporary table.
Chicago Taxi Trips Analysis10
7 years agoapache-2.0Python
Analysis of City Of Chicago Taxi Trip Dataset Using AWS EMR, Spark, PySpark, Zeppelin and Airbnb's Superset
Anaconda10
2 years ago1apache-2.0Python
python gift package
Pyspark Dataframe Made Easy10
2 years agoJupyter Notebook
pyspark dataframe made easy
Microdrill7
8 years ago3March 01, 20161apache-2.0Python
Simple Apache Drill alternative using PySpark
Spark For Noobs By A Noob7
6 years agoJupyter Notebook
Jupyter notebooks for learning PySpark
Parquettohdf56
8 years agoPython
pyspark code for exporting parquet to hdf5 files (one per executor) then merge them into one large hdf5 file
Alternatives To Parquettohdf5
Select To Compare


Alternative Project Comparisons
Popular Parquet Projects
Popular Pyspark Projects
Popular Data Processing Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
Writer
Pyspark
Parquet