Awesome Open Source

Programming Languages

Search results for scala parquet

52 search results found

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

Ratatool ⭐ 333

A tool for data sampling, data generation, and data diffing

Parquet4s ⭐ 267

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.

Magnolify ⭐ 155

A collection of Magnolia add-on modules

Bigdata Playground ⭐ 154

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Eel Sdk ⭐ 140

Big Data Toolkit for the JVM

Parquet Index ⭐ 113

Spark SQL index for Parquet tables

Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.

Sparksql Protobuf ⭐ 73

Read SparkSQL parquet file as RDD[Protobuf]

Gcs Tools ⭐ 70

GCS support for avro-tools, parquet-tools and protobuf

Avro Parquet Spark Example ⭐ 61

An example of using Avro and Parquet in Spark SQL

Monix Connect ⭐ 60

A set of connectors for Monix. 🔛

Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .

Spark Parquet Thrift Example ⭐ 44

Example Spark project using Parquet as a columnar store with Thrift objects.

Sparkwiki ⭐ 42

Etl Light ⭐ 38

A light Kafka to HDFS/S3 ETL library based on Apache Spark

scalable knowledge graph construction from unstructured text

Simplesparkavroapp ⭐ 32

Simple Spark app that reads and writes Avro data

Bucketing and partitioning system for Parquet

Parquet Extra ⭐ 29

A collection of Apache Parquet add-on modules

Topnotch ⭐ 29

A framework for systematically quality controlling big data.

Enceladus ⭐ 28

Dynamic Conformance Engine

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Spark_log_data ⭐ 21

Flume-to-Spark-Streaming Log Parser

Cda Client ⭐ 19

Cloud Data Access client

Spark Sql Gdelt ⭐ 16

Scripts and code to import the GDELT dataset into Spark SQL for analysis

Spark Bigquery ⭐ 15

Google BigQuery data source for Apache Spark

Parquet Generator ⭐ 15

Parquet file generator

Spark Vector ⭐ 15

Repository for the Spark-Vector connector

Experiments ⭐ 15

Code examples for my blog posts

Spark Lucenerdd Examples ⭐ 15

Examples of spark-lucenerdd

Spark To Tableau ⭐ 14

Spark to Tableau Extractor library

Embulk Output S3_parquet ⭐ 13

Embulk (https://github.com/embulk/embulk/) output plugin to dump records as Apache Parquet (https://parquet.apache.org/) files on S3.

Spark S3 ⭐ 11

Spark Plugin for Amazon S3

Scalpel Flattening ⭐ 11

This repository host code related SNDS database flattening

Parquet Custom Reader Writer ⭐ 11

Simple implementation of a custom parquet reader/writer

Kamu Engine Flink ⭐ 10

Apache Flink based engine for Open Data Fabric

Imooc Sparksql ⭐ 10

SparkSQL慕课网日志分析及可视化展示

Huemul Bigdatagovernance ⭐ 10

Huemul BigDataGovernance, es una framework que trabaja sobre Spark, Hive y HDFS. Permite la implementación de una estrategia corporativa de dato único, basada en buenas prácticas de Gobierno de Datos. Permite implementar tablas con control de Primary Key y Foreing Key al insertar y actualizar datos utilizando la librería, Validación de nulos, largos de textos, máximos/mínimos de números y fechas, valores únicos y valores por default. También permite clasificar los campos en aplicabilidad de der

Chronicles ⭐ 9

Version controlled immutable storage for Big Data

Elastic Tools ⭐ 9

Apache Spark based command line tools for ElasticSearch

Telecom Streaming ⭐ 9

Telecom scenarios implemented with streaming techniques

Example Applications ⭐ 8

Example applications for use with PNDA

Cloud Storage Extension ⭐ 8

Exasol Cloud Storage Extension for accessing formatted data Avro, Orc and Parquet, on public cloud storage systems

big data query console command and script for scala

Parquet Io Java ⭐ 6

Java library to read Parquet files.

Stackexchange Parquet ⭐ 6

Spark job for converting the StackExchange Network data into parquet format.

Spark Sessions ⭐ 6

Examples for how to split sets of time based events into sessions using Spark

Arrow Data Source ⭐ 5

Spark DataSouce plugin for reading files from various formats like Parquet into Arrow compatible columnar vectors.

A simple in-memory, configuration driven, data processing pipeline for Apache Spark.

Sample code to inform a discussion around content-addressable storage

Related Searches

Scala Sbt (4,178)

Scala Spark (3,279)

Scala Akka (2,120)

Java Scala (1,794)

Scala Play Framework (1,309)

Plugin Scala (1,079)

Scala Kafka (969)

Scala Functional Programming (942)

Scala Scalajs (887)

Docker Scala (728)

1-52 of 52 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.