Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for parquet
parquet
x
338 search results found
Iceberg
⭐
5,179
Apache Iceberg
Dsq
⭐
3,401
Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.
Roapi
⭐
2,969
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
Parquet Mr
⭐
2,296
Apache Parquet
Qsv
⭐
2,079
CSVs sliced, diced & analyzed.
Drill
⭐
1,856
Apache Drill is a distributed MPP query layer for self describing data
Influxdb_iox
⭐
1,805
Pronounced (influxdb eye-ox), short for iron oxide. This is the new core of InfluxDB written in Rust on top of Apache Arrow.
Gaffer
⭐
1,724
A large-scale entity and relation database supporting aggregation of properties
Petastorm
⭐
1,693
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Parquet Format
⭐
1,559
Apache Parquet
Quilt
⭐
1,299
Quilt is a data mesh for connecting people with actionable data
Rill
⭐
1,145
Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.
Parquet Go
⭐
1,107
pure golang library for reading/writing parquet file
Adam
⭐
966
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Cryo
⭐
862
cryo is the easiest way to extract blockchain data to parquet, csv, json, or python dataframes
Devops Python Tools
⭐
709
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Choetl
⭐
693
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Tech.ml.dataset
⭐
616
A Clojure high performance data processing system
Parquetviewer
⭐
574
Simple windows desktop application for viewing & querying Apache Parquet files
Kglab
⭐
518
Graph Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, NetworkX, RAPIDS, RDFlib, pySHACL, PyVis, morph-kgc, pslpython, pyarrow, etc.
Pigpen
⭐
513
Map-Reduce for Clojure
Parquet Dotnet
⭐
457
Fully managed Apache Parquet implementation
Vscode Data Preview
⭐
447
Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
Sparser
⭐
411
Sparser: Raw Filtering for Faster Analytics over Raw Data
Iceberg
⭐
409
Iceberg is a table format for large, slow-moving tabular data
Pystore
⭐
404
Fast data store for Pandas time-series data
Skale
⭐
398
High performance distributed data processing engine
Elasticsearch_loader
⭐
349
A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch
Amadeus
⭐
334
Harmonious distributed data analysis in Rust.
Ratatool
⭐
333
A tool for data sampling, data generation, and data diffing
Spindle
⭐
333
Next-generation web analytics processing with Scala, Spark, and Parquet.
Parquet Dotnet
⭐
319
🏐 Apache Parquet for modern .NET
Centurion
⭐
318
Kotlin Bigdata Toolkit
Parquet Cpp
⭐
312
Apache Parquet
Parquet2
⭐
311
Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow
Parquetjs
⭐
301
fully asynchronous, pure JavaScript implementation of the Parquet file format
Parquet_fdw
⭐
281
Parquet foreign data wrapper for PostgreSQL
Tutorials_cn
⭐
281
Bigdata File Viewer
⭐
269
A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Parquet4s
⭐
267
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Parquet Python
⭐
263
python implementation of the parquet columnar file format.
Grai Core
⭐
254
Lonboard
⭐
237
Python library for fast, interactive geospatial vector data visualization in Jupyter.
Parquet Go
⭐
228
Go package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.
Amazon S3 Find And Forget
⭐
223
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Awkward 0.x
⭐
218
Manipulate arrays of complex data structures as easily as Numpy.
Pqrs
⭐
212
Command line tool for inspecting Parquet files
Parquet Wasm
⭐
203
Rust-based WebAssembly bindings to read and write Apache Parquet data
Rumble
⭐
194
⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Spark Programming Guide Zh Cn
⭐
188
Spark 编程指南简体中文版
Odbc2parquet
⭐
186
A command line tool to query an ODBC data source and write the result into a parquet file.
D6tstack
⭐
166
Quickly ingest messy CSV and XLS files. Export to clean pandas, SQL, parquet
Kartothek
⭐
163
A consistent table management library in python
Magnolify
⭐
155
A collection of Magnolia add-on modules
Bigdata Playground
⭐
154
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Impala Tpcds Kit
⭐
150
TPC-DS Kit for Impala
Sqlite Parquet Vtable
⭐
146
A SQLite vtable extension to read Parquet files
Nyc Transport
⭐
144
A Unified Database of NYC transport (subway, taxi/Uber, and citibike) data.
Parquetsharp
⭐
142
ParquetSharp is a .NET library for reading and writing Apache Parquet files.
Thermorawfileparser
⭐
141
Thermo RAW file parser that runs on Linux/Mac and all other platforms that support Mono
Eel Sdk
⭐
140
Big Data Toolkit for the JVM
Herringbone
⭐
135
Tools for working with parquet, impala, and hive
Tensorqtl
⭐
135
Ultrafast GPU-enabled QTL mapper
Parquet Tools
⭐
135
easy install parquet-tools
Hybridbackend
⭐
134
A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster
Parquet Rs
⭐
129
Apache Parquet implementation in Rust
Parquet_s3_fdw
⭐
126
ParquetS3 Foreign Data Wrapper for PostgresSQL
Bdt
⭐
125
Boring Data Tool
Sergeant
⭐
124
💂♂️ Tools to Transform and Query Data with 'Apache' 'Drill'
Parquet Index
⭐
113
Spark SQL index for Parquet tables
Fhir Data Pipes
⭐
107
A collection of tools for extracting FHIR resources and analytics services on top of that data.
Datasets.jl
⭐
104
Spectrify
⭐
103
Export Redshift data and convert to Parquet for use with Redshift Spectrum or other data warehouses.
Boxball
⭐
99
Prebuilt Docker images with Retrosheet's complete baseball history data for many analytical frameworks. Includes Postgres, cstore_fdw, MySQL, SQLite, Clickhouse, Drill, Parquet, and CSV.
Warc Parquet
⭐
96
🗄️ A simple CLI for converting WARC to Parquet.
Gpu Bdb
⭐
95
RAPIDS GPU-BDB
Streamx
⭐
95
kafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)
Gpq
⭐
94
Utility for working with GeoParquet
Parquet Go Source
⭐
92
source provider for parquet-go
Json2parquet
⭐
92
Convert JSON files to Parquet using PyArrow
Schemer
⭐
89
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Scientificsummarizationdatasets
⭐
88
Datasets I have created for scientific summarization, and a trained BertSum model
Parquet.jl
⭐
87
Julia implementation of Parquet columnar file format reader
Parquet
⭐
82
A library for reading and writing parquet files.
Prql Query
⭐
77
Query and transform data with PRQL
Icedb
⭐
75
An in-process Parquet merge engine for better data warehousing in S3
Sparksql Protobuf
⭐
73
Read SparkSQL parquet file as RDD[Protobuf]
Openstreetmap_h3
⭐
72
OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/Parquet dumps
Ml Io
⭐
71
A high performance data access library for machine learning tasks
Gcs Tools
⭐
70
GCS support for avro-tools, parquet-tools and protobuf
Dendrite
⭐
67
Dendrite is a library for querying large datasets on a single host at near-interactive speeds.
Graphique
⭐
64
GraphQL service for arrow tables and parquet data sets.
Guery
⭐
61
Distributed SQL query engine written in Go for big data
Avro Parquet Spark Example
⭐
61
An example of using Avro and Parquet in Spark SQL
Data
⭐
61
An open-data fantasy football repository, maintained by DynastyProcess.com
Rainbow
⭐
61
A data layout optimization framework for wide tables stored on HDFS. See rainbow's webpage
Faker Cli
⭐
61
Command-line interface to quickly generate fake CSV and JSON data
Csv2parquet
⭐
60
Create Parquet files from CSV
Monix Connect
⭐
60
A set of connectors for Monix. 🔛
Parquet Compatibility
⭐
59
compatibility tests to make sur C and Java implementations can read each other
1-100 of 338 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.