Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for hdfs parquet
hdfs
x
parquet
x
28 search results found
Devops Python Tools
⭐
709
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Spindle
⭐
333
Next-generation web analytics processing with Scala, Spark, and Parquet.
Bigdata File Viewer
⭐
269
A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Rumble
⭐
194
⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Parquet Go Source
⭐
92
source provider for parquet-go
Rainbow
⭐
61
A data layout optimization framework for wide tables stored on HDFS. See rainbow's webpage
Monix Connect
⭐
60
A set of connectors for Monix. 🔛
Spark Compaction
⭐
52
File compaction tool that runs on top of the Spark framework.
Spark Parquet Thrift Example
⭐
44
Example Spark project using Parquet as a columnar store with Thrift objects.
Entrada
⭐
44
Entrada - A tool for DNS big data analytics
Etl Light
⭐
38
A light Kafka to HDFS/S3 ETL library based on Apache Spark
Arvo2parquet
⭐
30
Example program that writes Parquet formatted data to plain files (i.e., not Hadoop hdfs); Parquet is a columnar storage format.
Minipipe
⭐
30
Minipipe: a minimal end-to-end data pipeline
Pucket
⭐
29
Bucketing and partitioning system for Parquet
Topnotch
⭐
29
A framework for systematically quality controlling big data.
Enceladus
⭐
28
Dynamic Conformance Engine
Kafka Parquet Writer
⭐
26
This project provides a compenent that reads logs from Kafka and writes it as parquet file on HDFS.
Wasp
⭐
25
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Pipewrench
⭐
22
Data pipeline automation tool
Spark_log_data
⭐
21
Flume-to-Spark-Streaming Log Parser
Hadoop Etl Udfs
⭐
17
The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
Parquetplugin
⭐
11
Flink Tools
⭐
10
A collection of Flink applications for working with Pravega streams
Telecom Streaming
⭐
9
Telecom scenarios implemented with streaming techniques
Random Datagen
⭐
7
A generator of Random Data to HDFS, HBase, Hive, Kafka, Kudu, Ozone, SolR in CDP (Cloudera Data Platform)
Avrotoolbox
⭐
7
ArcGIS toolbox to process feature classes in Apache Avro and Parquet format
Bigdata Platform
⭐
6
End to end big data project, that aims to show how to implement different big data layers, from the infrastructure layer to the end user one. [HADOOP][Spark][Kafka][Cassandra][Ansible][Jupyter
Benchmarking Arrow
⭐
5
Benchmarking Arrow/Java
Related Searches
Hadoop Hdfs (1,082)
Java Hdfs (752)
Spark Hdfs (573)
1-28 of 28 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.