Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for delta lake
delta-lake
x
61 search results found
Doris
⭐
11,243
Apache Doris is an easy-to-use, high performance and unified analytics database.
Trino
⭐
9,118
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Starrocks
⭐
7,191
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.
Delta
⭐
6,656
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Roapi
⭐
2,969
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
Delta Rs
⭐
1,634
A native Rust library for Delta Lake, with bindings into Python
Delta Sharing
⭐
654
An open protocol for secure data sharing
Learningsparkv2
⭐
570
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Onetable
⭐
531
OneTable is an omni-directional converter for table formats that facilitates interoperability across data processing systems and query engines.
Connectors
⭐
383
This library allows Scala and Java-based projects (including Apache Flink, Apache Hive, Apache Beam, and PrestoDB) to read from and write to Delta Lake.
Seafowl
⭐
323
Analytical database for data-driven Web applications 🪶
Kafka Delta Ingest
⭐
289
A highly efficient daemon for streaming data from Kafka into Delta Lake
Dbldatagen
⭐
234
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
Amazon Sagemaker Local Mode
⭐
220
Amazon SageMaker Local Mode Examples
Mack
⭐
188
Delta Lake helper methods in PySpark
Lakehouse Engine
⭐
154
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
Streamis
⭐
96
Streaming application development and management system, based on Linkis and DSS, planning to provide the workflow-like graphical drag-and-drop development capability.
Smart Data Lake
⭐
87
Smart Automation Tool for building modern Data Lakes and Data Pipelines
Delta Architecture
⭐
66
Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline
Delta Sharing Rs
⭐
64
A Minimalistic Rust Implementation of Delta Sharing Server.
Faker Cli
⭐
61
Command-line interface to quickly generate fake CSV and JSON data
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Deltalakereader
⭐
45
Read Delta tables without any Spark
Edc Mod1 Exercise Igti
⭐
42
Exercícios do módulo 1 - Bootcamp EDC - IGTI 2021
Databricks
⭐
41
Databricks Platform - Architecture, Security, Automation and much more!!
Dask Deltatable
⭐
34
A Delta Lake reader for Dask
Building Data Lakehouse
⭐
32
Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data
Pysparkcheatsheet
⭐
30
PySpark Cheatsheet
Threat Detection And Visualization
⭐
30
Threat Detection and Visualization
101_upsert Delta
⭐
30
This repository exemplifies a simple ELT process using delta to perform upsert and remove data files that aren't in the latest state of the transaction log for the table.
Real Time Data Warehouse
⭐
29
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Delta Oms
⭐
28
DeltaOMS is a solution that help build a centralized repository of Delta Transaction logs and associated operational metrics/statistics for your Delta Lakehouse
Delta Go
⭐
26
Native Delta Lake Implementation in Go
Spark Structured Streaming Examples
⭐
25
Spark structured streaming examples with using of version 3.4.0
Lhbench
⭐
24
Lakehouse storage system benchmark
Spark Movies Etl
⭐
21
Spark data pipeline that ingests and transforms movie ratings data.
Olh
⭐
19
Open source stack lakehouse
Amazon Emr With Delta Lake
⭐
17
Amazon EMR Notebook to show how to read from and write to Delta tables with Amazon EMR
Delta Hub Ts
⭐
14
A platform and cloud-based service for data sharing based on Delta Sharing implemented using Next.js and TypeScript.
Hitchhikers_guide_to_deltalake_streaming
⭐
12
Don't Panic. This guide will help you when it feels like the end of the world.
Delta Lake Ui
⭐
12
UI to run SQL on Delta Lake tables and visualize the variations of the result among tables versions
Workshop Data Lakehouse
⭐
11
Repositório dedicado a Workshop de Data Lakehouse com Delta Lake
Db2ixf
⭐
10
db2ixf is a python package with a CLI that simplifies the parsing and processing of IBM Integration eXchange Format (IXF) files.
Free Resources Books Papers
⭐
10
Books and Papers in Mathematics, Econometrics, Machine Learning, Finance etc for different levels that can be useful for Data Scientists, Developers and everyone whoo is interesting in STEM.
Financial Data Project In Azure
⭐
8
Free High-Quality Financial Data in Azure
Emr Serverless Spark Delta Lake 2.0
⭐
8
A quick example for Delta Lake running on AWS EMR Serverless Spark
Lighthouse
⭐
8
Shed light on your data layout in order to monitor the health of your Lakehouse tables and identify when data maintenance operations should be performed.
Cdk Emrserverless With Delta Lake
⭐
8
This construct builds some elements for you to quickly launch an EMR Serverless application. After submitting the Emr Serverless job, you could also launch an EMR notebook via cluster template to check the outcome from the EMR Serverless application.
Net.jgp.books.spark.ch17
⭐
7
Spark in Action, 2nd edition - chapter 16 - exporting data, using delta lake
Diane
⭐
7
Hive helper functions for apache spark users
Spark Databricks
⭐
6
🔥 Master Apache Spark & Databricks! Dive into a world of big data with exclusive insights from Udemy courses, personal notes, and practical guides. Whether you're starting out or scaling new heights in data engineering, this is your ultimate resource hub! 🌟🚀
Dbt On Aws
⭐
6
dbt (data build tool) projects targeting AWS analytics services (redshift, glue, emr, athena) and open table formats
Doris Sdk
⭐
5
SDK for Apache Doris
Waterbear
⭐
5
Automated provisioning of an industry Lakehouse with enterprise data model
Spark Structured Streaming Kafka
⭐
5
Spark Structured Streaming + Kafka + Delta pipeline.
Flight Fusion
⭐
5
Delta Buddy
⭐
5
Introducing Delta-Buddy: Your ultimate Delta Lake companion! 🚀 Streamline your data journey with an AI-powered chatbot. Ask Delta-Buddy anything about your Delta Lake.
Genomic Bigdata Spark
⭐
5
Genomic BigData Warehousing with Apache Spark and LakeHouse Architecture
Delta Dotnet
⭐
5
Delta Lake native library for .NET
Lakeapi
⭐
5
API for distributing Data Lake Data
Nodejs Sharing Client
⭐
5
A Node.js connector for Delta Sharing.
1-61 of 61 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.