Awesome Open Source

Programming Languages

Search results for spark data lake

12 search results found

Upserts, Deletes And Incremental Processing on Big Data.

Lakesoul ⭐ 2,248

LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

Kyuubi ⭐ 1,849

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

Goodreads_etl_pipeline ⭐ 593

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Marmaray ⭐ 444

Generic Data Ingestion & Dispersal Library for Hadoop

Data Engineering Projects ⭐ 322

Personal Data Engineering Projects

Smart Data Lake ⭐ 87

Smart Automation Tool for building modern Data Lakes and Data Pipelines

Apachespark ⭐ 59

This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.

Lighthouse ⭐ 54

Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.

Anyscale ⭐ 49

anyscale roadmap

Datapipelines Essentials Python ⭐ 45

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Real Time Data Warehouse ⭐ 29

Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi

Enceladus ⭐ 28

Dynamic Conformance Engine

Data Engineer Nanodegree Projects Udacity ⭐ 27

Projects done in the Data Engineer Nanodegree Program by Udacity.com

Jobanalytics_and_search ⭐ 22

JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.

Sparkprogramminginscala ⭐ 18

Apache Spark Course Material

Data Mill ⭐ 16

A K8s-based infrastructure for analytics

Data Pipeline from the Global Historical Climatology Network DataSet

Kyuubi Docker ⭐ 9

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

Awesome Data Pipeline ⭐ 6

Awesome list for datapipeline

Bigdata Platform ⭐ 6

End to end big data project, that aims to show how to implement different big data layers, from the infrastructure layer to the end user one. [HADOOP][Spark][Kafka][Cassandra][Ansible][Jupyter

Formacao Engenheiro De Dados Cloud E Big Data Azure Databricks ⭐ 6

Formação Engenheiro de Dados Cloud e Big Data (Azure & DataBricks)

Udacity Data Engineering Nanodegree ⭐ 5

This is a repository to hold the files and notebooks produced throughout my Udacity's Nanodegree Data Engineering program.

Genomic Bigdata Spark ⭐ 5

Genomic BigData Warehousing with Apache Spark and LakeHouse Architecture

Spark Streaming In Python ⭐ 5

Apache Spark 3 - Structured Streaming Course Material

Microsoft Big Data Scientist And Ai ⭐ 5

Microsoft Big Data, Data Scientist, and AI

Related Searches

Scala Spark (3,279)

Python Spark (2,053)

Java Spark (1,587)

Apache Spark (1,207)

Spark Hadoop (1,188)

Jupyter Notebook Spark (1,151)

Spark Kafka (985)

Spark Streaming (817)

Spark Pyspark (812)

Shell Spark (705)

1-12 of 12 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.