Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for spark parquet
parquet
x
spark
x
67 search results found
Iceberg
⭐
5,179
Apache Iceberg
Gaffer
⭐
1,724
A large-scale entity and relation database supporting aggregation of properties
Petastorm
⭐
1,693
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Adam
⭐
966
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Devops Python Tools
⭐
709
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Iceberg
⭐
409
Iceberg is a table format for large, slow-moving tabular data
Spindle
⭐
333
Next-generation web analytics processing with Scala, Spark, and Parquet.
Rumble
⭐
194
⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Spark Programming Guide Zh Cn
⭐
188
Spark 编程指南简体中文版
Parquet Index
⭐
113
Spark SQL index for Parquet tables
Schemer
⭐
89
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Avro Parquet Spark Example
⭐
61
An example of using Avro and Parquet in Spark SQL
Iceberg
⭐
59
A temporary home for LinkedIn's changes to Apache Iceberg (incubating)
Spark Compaction
⭐
52
File compaction tool that runs on top of the Spark framework.
Spark Mail
⭐
45
Tutorial on parsing Enron email to Avro and then explore the email set using Spark.
Spark Parquet Thrift Example
⭐
44
Example Spark project using Parquet as a columnar store with Thrift objects.
Etl Light
⭐
38
A light Kafka to HDFS/S3 ETL library based on Apache Spark
Dstlr
⭐
33
scalable knowledge graph construction from unstructured text
Simplesparkavroapp
⭐
32
Simple Spark app that reads and writes Avro data
Topnotch
⭐
29
A framework for systematically quality controlling big data.
Pucket
⭐
29
Bucketing and partitioning system for Parquet
Enceladus
⭐
28
Dynamic Conformance Engine
Wasp
⭐
25
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Daflow
⭐
24
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Sql Based Etl With Apache Spark On Amazon Eks
⭐
23
A solution that provides declarative data processing capability, and workflow orchestration automation to help your business users (such as analysts and data scientists) access their data and create meaningful insights without the need for manual IT processes.
Forklift
⭐
22
🚚 ETL for Spark and Airflow
Spark_log_data
⭐
21
Flume-to-Spark-Streaming Log Parser
Cda Client
⭐
19
Cloud Data Access client
Albis
⭐
17
Albis: High-Performance File Format for Big Data Systems
Spark Sql Gdelt
⭐
16
Scripts and code to import the GDELT dataset into Spark SQL for analysis
Spark Lucenerdd Examples
⭐
15
Examples of spark-lucenerdd
Parquet Generator
⭐
15
Parquet file generator
Spark Bigquery
⭐
15
Google BigQuery data source for Apache Spark
Experiments
⭐
15
Code examples for my blog posts
Spark Vector
⭐
15
Repository for the Spark-Vector connector
Spark To Tableau
⭐
14
Spark to Tableau Extractor library
Pyspark S3 Parquet Example
⭐
13
This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apache Spark instance running on AWS EMR, which will run a SQLContext to create a temporary table using a DataFrame. SQL queries will then be possible against the temporary table.
Infoflow
⭐
12
An Apache Spark implementation of the InfoMap community detection algorithm
Scalpel Flattening
⭐
11
This repository host code related SNDS database flattening
Intelqatcodec
⭐
11
Spark S3
⭐
11
Spark Plugin for Amazon S3
Huemul Bigdatagovernance
⭐
10
Huemul BigDataGovernance, es una framework que trabaja sobre Spark, Hive y HDFS. Permite la implementación de una estrategia corporativa de dato único, basada en buenas prácticas de Gobierno de Datos. Permite implementar tablas con control de Primary Key y Foreing Key al insertar y actualizar datos utilizando la librería, Validación de nulos, largos de textos, máximos/mínimos de números y fechas, valores únicos y valores por default. También permite clasificar los campos en aplicabilidad de der
Pyspark Dataframe Made Easy
⭐
10
pyspark dataframe made easy
Chicago Taxi Trips Analysis
⭐
10
Analysis of City Of Chicago Taxi Trip Dataset Using AWS EMR, Spark, PySpark, Zeppelin and Airbnb's Superset
Imooc Sparksql
⭐
10
SparkSQL慕课网日志分析及可视化展示
Telecom Streaming
⭐
9
Telecom scenarios implemented with streaming techniques
Elastic Tools
⭐
9
Apache Spark based command line tools for ElasticSearch
Redditr Insight Data Engineering Project
⭐
8
RedditR for Content Engagement and Recommendation
Example Applications
⭐
8
Example applications for use with PNDA
Sempala
⭐
7
Sempala is a SPARQL-over-SQL approach to provide interactive-time SPARQL query processing on Hadoop. It stores RDF data in a columnar layout (Parquet) on HDFS and uses either Impala or Spark as the execution layer on top of it. SPARQL queries are translated into Impala/Spark SQL for execution.
Spark For Noobs By A Noob
⭐
7
Jupyter notebooks for learning PySpark
Spark Streaming Twitter
⭐
7
Building pipeline to process the real-time data using Spark and Mongodb.
Query
⭐
7
big data query console command and script for scala
Avrotoparquet
⭐
6
Command line converter for Apache Avro to Apache Parquet file formats
Strava Spark
⭐
6
Analyzing my Strava history with Spark
Tomatula
⭐
6
Iota
⭐
6
Ob Spark Shell
⭐
6
Scala spark-shell backend for Org-mode's Babel
Bigdata Platform
⭐
6
End to end big data project, that aims to show how to implement different big data layers, from the infrastructure layer to the end user one. [HADOOP][Spark][Kafka][Cassandra][Ansible][Jupyter
Spark Sessions
⭐
6
Examples for how to split sets of time based events into sessions using Spark
Stackexchange Parquet
⭐
6
Spark job for converting the StackExchange Network data into parquet format.
Schema_evolution_exploration
⭐
5
Explore schema evolution using parquet and Spark or Presto
Avroparquet
⭐
5
AVRO / Parquet Demo Code
Genomic Bigdata Spark
⭐
5
Genomic BigData Warehousing with Apache Spark and LakeHouse Architecture
Mambo
⭐
5
A simple in-memory, configuration driven, data processing pipeline for Apache Spark.
Arrow Data Source
⭐
5
Spark DataSouce plugin for reading files from various formats like Parquet into Arrow compatible columnar vectors.
Glue
⭐
5
Related Searches
Scala Spark (3,279)
Python Spark (2,053)
Java Spark (1,587)
Jupyter Notebook Spark (1,268)
Apache Spark (1,207)
Spark Hadoop (1,188)
Spark Kafka (985)
Spark Streaming (817)
Spark Pyspark (812)
Docker Spark (683)
1-67 of 67 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.