Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for scala parquet
parquet
x
scala
x
52 search results found
Adam
⭐
966
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Ratatool
⭐
333
A tool for data sampling, data generation, and data diffing
Parquet4s
⭐
267
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Magnolify
⭐
155
A collection of Magnolia add-on modules
Bigdata Playground
⭐
154
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Eel Sdk
⭐
140
Big Data Toolkit for the JVM
Parquet Index
⭐
113
Spark SQL index for Parquet tables
Schemer
⭐
89
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Sparksql Protobuf
⭐
73
Read SparkSQL parquet file as RDD[Protobuf]
Gcs Tools
⭐
70
GCS support for avro-tools, parquet-tools and protobuf
Avro Parquet Spark Example
⭐
61
An example of using Avro and Parquet in Spark SQL
Monix Connect
⭐
60
A set of connectors for Monix. 🔛
Spark
⭐
55
Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .
Spark Parquet Thrift Example
⭐
44
Example Spark project using Parquet as a columnar store with Thrift objects.
Sparkwiki
⭐
42
Etl Light
⭐
38
A light Kafka to HDFS/S3 ETL library based on Apache Spark
Dstlr
⭐
33
scalable knowledge graph construction from unstructured text
Simplesparkavroapp
⭐
32
Simple Spark app that reads and writes Avro data
Pucket
⭐
29
Bucketing and partitioning system for Parquet
Parquet Extra
⭐
29
A collection of Apache Parquet add-on modules
Topnotch
⭐
29
A framework for systematically quality controlling big data.
Enceladus
⭐
28
Dynamic Conformance Engine
Wasp
⭐
25
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Daflow
⭐
24
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Spark_log_data
⭐
21
Flume-to-Spark-Streaming Log Parser
Cda Client
⭐
19
Cloud Data Access client
Spark Sql Gdelt
⭐
16
Scripts and code to import the GDELT dataset into Spark SQL for analysis
Spark Bigquery
⭐
15
Google BigQuery data source for Apache Spark
Parquet Generator
⭐
15
Parquet file generator
Spark Vector
⭐
15
Repository for the Spark-Vector connector
Experiments
⭐
15
Code examples for my blog posts
Spark Lucenerdd Examples
⭐
15
Examples of spark-lucenerdd
Spark To Tableau
⭐
14
Spark to Tableau Extractor library
Embulk Output S3_parquet
⭐
13
Embulk (https://github.com/embulk/embulk/) output plugin to dump records as Apache Parquet (https://parquet.apache.org/) files on S3.
Spark S3
⭐
11
Spark Plugin for Amazon S3
Scalpel Flattening
⭐
11
This repository host code related SNDS database flattening
Parquet Custom Reader Writer
⭐
11
Simple implementation of a custom parquet reader/writer
Kamu Engine Flink
⭐
10
Apache Flink based engine for Open Data Fabric
Imooc Sparksql
⭐
10
SparkSQL慕课网日志分析及可视化展示
Huemul Bigdatagovernance
⭐
10
Huemul BigDataGovernance, es una framework que trabaja sobre Spark, Hive y HDFS. Permite la implementación de una estrategia corporativa de dato único, basada en buenas prácticas de Gobierno de Datos. Permite implementar tablas con control de Primary Key y Foreing Key al insertar y actualizar datos utilizando la librería, Validación de nulos, largos de textos, máximos/mínimos de números y fechas, valores únicos y valores por default. También permite clasificar los campos en aplicabilidad de der
Chronicles
⭐
9
Version controlled immutable storage for Big Data
Elastic Tools
⭐
9
Apache Spark based command line tools for ElasticSearch
Telecom Streaming
⭐
9
Telecom scenarios implemented with streaming techniques
Example Applications
⭐
8
Example applications for use with PNDA
Cloud Storage Extension
⭐
8
Exasol Cloud Storage Extension for accessing formatted data Avro, Orc and Parquet, on public cloud storage systems
Query
⭐
7
big data query console command and script for scala
Parquet Io Java
⭐
6
Java library to read Parquet files.
Stackexchange Parquet
⭐
6
Spark job for converting the StackExchange Network data into parquet format.
Spark Sessions
⭐
6
Examples for how to split sets of time based events into sessions using Spark
Arrow Data Source
⭐
5
Spark DataSouce plugin for reading files from various formats like Parquet into Arrow compatible columnar vectors.
Mambo
⭐
5
A simple in-memory, configuration driven, data processing pipeline for Apache Spark.
Gastore
⭐
5
Sample code to inform a discussion around content-addressable storage
Related Searches
Scala Sbt (4,178)
Scala Spark (3,279)
Scala Akka (2,120)
Java Scala (1,794)
Scala Play Framework (1,309)
Plugin Scala (1,079)
Scala Kafka (969)
Scala Functional Programming (942)
Scala Scalajs (887)
Docker Scala (728)
1-52 of 52 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.