Awesome Open Source

Programming Languages

Search results for hdfs parquet

28 search results found

Devops Python Tools ⭐ 709

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Spindle ⭐ 333

Next-generation web analytics processing with Scala, Spark, and Parquet.

Bigdata File Viewer ⭐ 269

A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Parquet Go Source ⭐ 92

source provider for parquet-go

A data layout optimization framework for wide tables stored on HDFS. See rainbow's webpage

Monix Connect ⭐ 60

A set of connectors for Monix. 🔛

Spark Compaction ⭐ 52

File compaction tool that runs on top of the Spark framework.

Spark Parquet Thrift Example ⭐ 44

Example Spark project using Parquet as a columnar store with Thrift objects.

Entrada - A tool for DNS big data analytics

Etl Light ⭐ 38

A light Kafka to HDFS/S3 ETL library based on Apache Spark

Arvo2parquet ⭐ 30

Example program that writes Parquet formatted data to plain files (i.e., not Hadoop hdfs); Parquet is a columnar storage format.

Minipipe ⭐ 30

Minipipe: a minimal end-to-end data pipeline

Bucketing and partitioning system for Parquet

Topnotch ⭐ 29

A framework for systematically quality controlling big data.

Enceladus ⭐ 28

Dynamic Conformance Engine

Kafka Parquet Writer ⭐ 26

This project provides a compenent that reads logs from Kafka and writes it as parquet file on HDFS.

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Pipewrench ⭐ 22

Data pipeline automation tool

Spark_log_data ⭐ 21

Flume-to-Spark-Streaming Log Parser

Hadoop Etl Udfs ⭐ 17

The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL

Parquetplugin ⭐ 11

Flink Tools ⭐ 10

A collection of Flink applications for working with Pravega streams

Telecom Streaming ⭐ 9

Telecom scenarios implemented with streaming techniques

Random Datagen ⭐ 7

A generator of Random Data to HDFS, HBase, Hive, Kafka, Kudu, Ozone, SolR in CDP (Cloudera Data Platform)

Avrotoolbox ⭐ 7

ArcGIS toolbox to process feature classes in Apache Avro and Parquet format

Bigdata Platform ⭐ 6

End to end big data project, that aims to show how to implement different big data layers, from the infrastructure layer to the end user one. [HADOOP][Spark][Kafka][Cassandra][Ansible][Jupyter

Benchmarking Arrow ⭐ 5

Benchmarking Arrow/Java

Related Searches

Hadoop Hdfs (1,082)

Java Hdfs (752)

Spark Hdfs (573)

1-28 of 28 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.