Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for spark etl
etl
x
spark
x
102 search results found
Doris
⭐
11,243
Apache Doris is an easy-to-use, high performance and unified analytics database.
Dagster
⭐
9,467
An orchestration platform for the development, production, and observation of data assets.
Mage Ai
⭐
6,324
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Aws Glue Samples
⭐
1,334
AWS Glue code samples
Pyspark Example Project
⭐
1,034
Example project implementing best practices for PySpark ETL jobs and applications.
Zingg
⭐
828
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Goodreads_etl_pipeline
⭐
593
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Aws Glue Libs
⭐
568
AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Metorikku
⭐
536
A simplified, lightweight ETL Framework based on Apache Spark
Spark Excel
⭐
421
A Spark plugin for reading and writing Excel files
Zdh_web
⭐
379
大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台,包含数据采集,调度,权限,审批
Big_data_architect_skills
⭐
353
一个大数据架构师应该掌握的技能
Data Engineering Projects
⭐
322
Personal Data Engineering Projects
Beginner_de_project
⭐
276
Beginner data engineering project - batch edition
Butterfree
⭐
269
A tool for building feature stores.
Setl
⭐
173
A simple Spark-powered ETL framework that just works 🍺
Cobrix
⭐
131
A COBOL parser and Mainframe/EBCDIC data source for Apache Spark
Easy_sql
⭐
126
A library developed to ease the data ETL development process.
Chombo
⭐
102
Big Data ETL and Utilities for Hadoop Map Reduce, Spark and Storm
Flowman
⭐
85
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
Gallia Core
⭐
79
A schema-aware Scala library for data transformation
Data Engineering Nanodegree
⭐
76
Projects done in the Data Engineering Nanodegree by Udacity.com
Luigi Warehouse
⭐
73
A luigi powered analytics / warehouse stack
Udacity Data Engineer Nanodegree
⭐
64
Classwork projects and home works done through Udacity data engineering nano degree
Spark Etl
⭐
62
Apache Spark based ETL Engine
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Zdh_server
⭐
56
数据采集平台zdh,etl 处理服务
Onetl
⭐
55
One ETL tool to rule them all
Data Engineering
⭐
55
How to build an awesome data engineering team
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Etlflow
⭐
43
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Google Cloud Platform, AWS, Kubernetes, Databases, SFTP servers, On-Prem Systems and more.
Architect_big_data_solutions_with_spark
⭐
42
code, labs and lectures for the course
Udacity Data Engineering
⭐
42
Udacity Data Engineering Nano Degree (DEND)
Spark Ref Architecture
⭐
38
Reference Architectures for Apache Spark
Etl Light
⭐
38
A light Kafka to HDFS/S3 ETL library based on Apache Spark
Sope
⭐
37
Apache Spark ETL Utilities
Sharpetl
⭐
36
Write ETL using your favorite SQL dialects
Amazon Eks Apache Spark Etl Sample
⭐
35
Spark ETL example processing New York taxi rides public dataset on EKS
Ides
⭐
32
智能数据探索服务(Intelligent Data Exploration Service),一站式Data + AI数据解决方案!
Yaetos
⭐
32
Write data & AI pipelines in (SQL, Spark, Pandas) and deploy to the cloud, simplified
Basin
⭐
29
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Starlake
⭐
29
Starlake is an On Premise and Cloud ELT/ETL Framework for Batch & Stream Processing
Data Engineer Nanodegree Projects Udacity
⭐
27
Projects done in the Data Engineer Nanodegree Program by Udacity.com
Nebula Exchange
⭐
26
NebulaGraph Exchange is an Apache Spark application to parse data from different sources to NebulaGraph in a distributed environment. It supports both batch and streaming data in various formats and sources including other Graph Databases, RDBMS, Data warehouses, NoSQL, Message Bus, File systems, etc.
Spark Gotchas
⭐
25
Few things we've met during our etl project based on spark
Wasp
⭐
25
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Daflow
⭐
24
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Sql Based Etl With Apache Spark On Amazon Eks
⭐
23
A solution that provides declarative data processing capability, and workflow orchestration automation to help your business users (such as analysts and data scientists) access their data and create meaningful insights without the need for manual IT processes.
Whakapai
⭐
22
Various Python Data Science Projects available in PyPi
Aws Glue Docker
⭐
22
🐋 Docker image for AWS Glue Spark/Python
Forklift
⭐
22
🚚 ETL for Spark and Airflow
De 100 Days
⭐
22
data engineering 100 days 🤖 🧲 🦾 | #DE
Spark Movies Etl
⭐
21
Spark data pipeline that ingests and transforms movie ratings data.
Zephyr
⭐
21
Zephyr is a big data, platform agnostic ETL API, with Hadoop MapReduce, Storm, and other big data bindings.
Pramen
⭐
20
Resilient data pipeline framework running on Apache Spark
Cda Client
⭐
19
Cloud Data Access client
Jun_bigdata
⭐
18
jun_bigdata大数据平台服务框架。实现了Kafka实时数据过滤、清洗、转换、消费,实现了Sp SQL对Redis、MongoDB等非关系型数据库的数据的读写;集成了规则引擎,可基于规则引擎实现客
Sparklanes
⭐
16
A lightweight data processing framework for Apache Spark
Telemetry Streaming
⭐
15
Spark Streaming ETL jobs for Mozilla Telemetry
Spark Etl
⭐
15
Set of ETL utils for Spark
Ghcn D
⭐
14
Data Pipeline from the Global Historical Climatology Network DataSet
Bigetl
⭐
13
This project is a unified ETL platform that support various data processing technologies, including Spark, Hive, Hadoop, Python, Linux Shell script, etc.
Datalink
⭐
13
简单易用的ETL工具
Camus Compressor
⭐
12
Camus Compressor merges files created by Camus and saves them in a compressed format.
Airflowjob
⭐
11
Airflow POC demo : 1) env set up 2) airflow DAG 3) Spark/ML pipeline | #DE
Bigdata Etl Pipeline
⭐
10
The Data Pipeline and Analytics Stack is a comprehensive solution designed for processing, storing, and visualizing data. Explore a complete data pipeline with all components seamlessly set up and ready to use
Spark Etl Atlas
⭐
10
A small project to show how to add lineage to Atlas when using Spark as ETL tool
Dcc Release
⭐
10
Second generation of the ICGC DCC release ETL built on Spark
Diem
⭐
10
DIEM Data Integration Engine Multipurpose
Yasp
⭐
9
Yet Another SPark Framework
Restaurantinspectionssparkmlnet
⭐
9
ETL & Data Enrichment with Spark.NET and ML.NET Automated (Auto) ML
Pyspark Template
⭐
8
A Python PySpark Projet with Poetry
Data Engineering Onboarding Starter
⭐
8
This repository contains a 10 step program to enter the world of Data Engineering
Apache Spark Etl Pipeline Example
⭐
8
Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computing.
Getl
⭐
8
An elegant way to ETL'ing
Spooq
⭐
8
Hbaseetl
⭐
8
Spark HbaseETL Tools. Support bulk
Dlt With Debug
⭐
8
A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.
Spark Etl Demo
⭐
7
Demo of an ETL Spark Job
Meetup Spark Airflow Demo
⭐
7
Spark & Airflow demo for meetup
Greenplum Streamsets
⭐
7
Greenplum with Streamsets
Aws Etl
⭐
7
This is an ETL application on AWS with general open sales and customer data that you can find here: https://github.com/camposvinicius/data/blob/main/A it's a zipped file with some .csvs inside that we will apply transformations.
Mongodb Elasticsearch Spark Etl
⭐
7
Generic template to read MongoDB and migrate to ElasticSearch
Spark Etl Framework
⭐
7
A generic ETL framework with Spark_SQL for transforming data by constructing pipelines with Yaml/Json/Xml
Spark Kafka Simple Consumer Receiver
⭐
7
Pyspark Boilerplate Mehdio
⭐
7
Pyspark boilerplate for running prod ready data pipeline
Etl Processes Using Sqoop Hadoop Hive Spark And Scala
⭐
7
I implemented various ETL processes like loading the data using sqoop from mysql to hdfs, transform the data using Spark and Scala, perform analytics using Spark and Scala and loading the data back to HDFS.
Openmrs Etl
⭐
7
openmrs - mysql - debezium - kafka - spark - scala
Data Engineer Portfolio
⭐
6
This is a repository to demonstrate my details, skills, projects and to keep track of my progression in Data Analytics and Data Science topics.
Data.engineers.lunch
⭐
6
Resources from weekly Zoom lunches revolving around Data Engineering. Hosted by Anant Corporation.
Spark Databricks
⭐
6
🔥 Master Apache Spark & Databricks! Dive into a world of big data with exclusive insights from Udemy courses, personal notes, and practical guides. Whether you're starting out or scaling new heights in data engineering, this is your ultimate resource hub! 🌟🚀
Setl Examples
⭐
6
Learn SETL with examples, lessons and exercises
Yl Spark Sql
⭐
6
一个Spark SQL方言,增强了批处理、机器学习、模型服务等语义;基于统一的SQL语法,提供了一个ETL、机器学习
Spark Sql Etl Framework
⭐
6
Multi-stage, config driven, SQL based ETL framework using PySpark
Kf Portal Etl
⭐
5
🏭 Extract-Transform-Load Pipeline for producing data for the Kids First Data Resource Portal
Udacity Data Engineering Nanodegree
⭐
5
This is a repository to hold the files and notebooks produced throughout my Udacity's Nanodegree Data Engineering program.
Doris Sdk
⭐
5
SDK for Apache Doris
Datafastlane
⭐
5
Data in the Fast Lane is a powerful and extensible ETL that leverages Apache Spark.
Example
⭐
5
HbaseETL
Spark Structured Streaming Kafka
⭐
5
Spark Structured Streaming + Kafka + Delta pipeline.
Related Searches
Scala Spark (3,279)
Python Spark (2,053)
Java Spark (1,587)
Apache Spark (1,207)
Spark Hadoop (1,188)
Jupyter Notebook Spark (1,151)
Spark Kafka (985)
Spark Streaming (817)
Spark Pyspark (812)
Python Etl (807)
1-100 of 102 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.