Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for spark data lake
data-lake
x
spark
x
12 search results found
Hudi
⭐
4,901
Upserts, Deletes And Incremental Processing on Big Data.
Lakesoul
⭐
2,248
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Kyuubi
⭐
1,849
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Kylo
⭐
1,035
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Zingg
⭐
828
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Goodreads_etl_pipeline
⭐
593
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Marmaray
⭐
444
Generic Data Ingestion & Dispersal Library for Hadoop
Data Engineering Projects
⭐
322
Personal Data Engineering Projects
Smart Data Lake
⭐
87
Smart Automation Tool for building modern Data Lakes and Data Pipelines
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Lighthouse
⭐
54
Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.
Anyscale
⭐
49
anyscale roadmap
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Real Time Data Warehouse
⭐
29
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Enceladus
⭐
28
Dynamic Conformance Engine
Data Engineer Nanodegree Projects Udacity
⭐
27
Projects done in the Data Engineer Nanodegree Program by Udacity.com
Jobanalytics_and_search
⭐
22
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Sparkprogramminginscala
⭐
18
Apache Spark Course Material
Data Mill
⭐
16
A K8s-based infrastructure for analytics
Ghcn D
⭐
14
Data Pipeline from the Global Historical Climatology Network DataSet
Kyuubi Docker
⭐
9
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Awesome Data Pipeline
⭐
6
Awesome list for datapipeline
Bigdata Platform
⭐
6
End to end big data project, that aims to show how to implement different big data layers, from the infrastructure layer to the end user one. [HADOOP][Spark][Kafka][Cassandra][Ansible][Jupyter
Formacao Engenheiro De Dados Cloud E Big Data Azure Databricks
⭐
6
Formação Engenheiro de Dados Cloud e Big Data (Azure & DataBricks)
Udacity Data Engineering Nanodegree
⭐
5
This is a repository to hold the files and notebooks produced throughout my Udacity's Nanodegree Data Engineering program.
Genomic Bigdata Spark
⭐
5
Genomic BigData Warehousing with Apache Spark and LakeHouse Architecture
Spark Streaming In Python
⭐
5
Apache Spark 3 - Structured Streaming Course Material
Microsoft Big Data Scientist And Ai
⭐
5
Microsoft Big Data, Data Scientist, and AI
Related Searches
Scala Spark (3,279)
Python Spark (2,053)
Java Spark (1,587)
Apache Spark (1,207)
Spark Hadoop (1,188)
Jupyter Notebook Spark (1,151)
Spark Kafka (985)
Spark Streaming (817)
Spark Pyspark (812)
Shell Spark (705)
1-12 of 12 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.