Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for etl data pipeline
data-pipeline
x
etl
x
58 search results found
Dagster
⭐
7,585
An orchestration platform for the development, production, and observation of data assets.
Mage Ai
⭐
4,790
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Orchest
⭐
3,876
Build data pipelines, the easy way 🛠️
Kestra
⭐
3,484
Kestra is an infinitely scalable orchestration and scheduling platform, creating, running, scheduling, and monitoring millions of complex pipelines.
Go Streams
⭐
1,344
A lightweight stream processing library for Go
Vdp
⭐
822
💧 Versatile Data Pipeline (VDP) is an open-source tool to seamlessly integrate AI for unstructured data into the modern data stack
Optimus
⭐
700
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
Dataform
⭐
674
Dataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Data Engineering Wiki
⭐
625
The best place to learn data engineering. Built and maintained by the data engineering community.
Covalent
⭐
459
Pythonic tool for running machine-learning/high performance/quantum-computing workflows in heterogenous environments.
Versatile Data Kit
⭐
338
Build, run and manage your data pipelines with Python or SQL on any cloud
Zdh_web
⭐
282
大数据采集,抽取平台
Conduit
⭐
276
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
Cuelake
⭐
266
Use SQL to build ELT pipelines on a data lakehouse.
Recap
⭐
239
Recap tracks and transform schemas across your whole application.
Dataplane
⭐
129
Dataplane is an Airflow inspired data platform with additional data mesh capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.
Patterns Devkit
⭐
101
Data pipelines from re-usable components
Datacater
⭐
78
The developer-friendly ETL platform for transforming data in real-time. Based on Apache Kafka® and Kubernetes®.
Beneath
⭐
64
Beneath is a serverless real-time data platform ⚡️
Udacity Data Engineer Nanodegree
⭐
52
Classwork projects and home works done through Udacity data engineering nano degree
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Pandas To Postgres
⭐
33
Copy Pandas DataFrames and HDF5 files to PostgreSQL database
Blast
⭐
31
Blast is a data orchestration tool that can run SQL and Python against Google BigQuery and Snowflake. It supports templating with Jinja, data quality tests, query validation, environment management and more.
Alto
⭐
26
Alto is a versatile data integration tool that allows you to easily run Singer plugins, build and cache PEX files encapsulating those plugins, and create a data reservoir whereby you can extract once and replay to as many destinations as you want.
Covalent Slurm Plugin
⭐
24
Executor plugin interfacing Covalent with Slurm
Spark Movies Etl
⭐
20
Spark data pipeline that ingests and transforms movie ratings data.
Enrich
⭐
16
Snowplow Enrichment jobs and library
Airflowetl
⭐
16
Blog post on ETL pipelines with Airflow
Rivery_cli
⭐
16
Rivery CLI
Airflowdatapipeline
⭐
15
Example of an ETL Pipeline using Airflow
Pramen
⭐
15
Resilient data pipeline framework running on Apache Spark
Barnard59
⭐
11
An intuitive and flexible RDF pipeline solution designed to simplify and automate ETL processes for efficient data management.
Greatex
⭐
10
A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in Airflow.
Covalent Ssh Plugin
⭐
8
Executor plugin interfacing Covalent with remote backends using SSH
Smartpipeline
⭐
8
A framework for rapid development of robust data pipelines following a simple design pattern
Datacrafter
⭐
6
NoSQL extract, transform, load (ETL) toolkit with Python
Cargo
⭐
6
Blockchain sync for the cult of Cardano
Scribe Data
⭐
6
Wikidata and Wikipedia data extraction for Scribe applications
Goomba
⭐
5
A workflow based data pipeline framework for golang.
Udacity Data Engineering Nanodegree
⭐
5
This is a repository to hold the files and notebooks produced throughout my Udacity's Nanodegree Data Engineering program.
Dataflow Cookiecutter
⭐
4
Create production-ready Dataflow projects in a zap! ⚡️
Prensio
⭐
4
Simple yet versatile "change data capture" tool for mysql and kafka. It taps into mysql binlog events, runs the transformation logic on them and produces kafka events
Dataplane Python Package
⭐
3
The data engineering library to build robust, reliable and on time data pipelines in Python. Integrates with Dataplane Data Platform.
Smol Elt
⭐
3
a smol elt (not etl) pipeline for smol tasks
Justconveyor
⭐
3
Ease of use in-app micro-ETL framework for building data processing pipelines.
Battetl
⭐
3
A module for extracting, transforming, and loading battery cycler data to a database.
Workflow
⭐
3
Goomba workflows
Twitter Data Pipeline Using Airflow And Aws S3
⭐
2
An end-to-end Twitter Data Pipeline that extracts data from Twitter and loads it into AWS S3.
Ml_sentiment_analysis_etl
⭐
2
ETL Data pipeline to download tweets using Tweepy and the twitter-streaming-api, save in an MySQL db, and analyze tweet sentiments.
Helm Charts
⭐
1
💧 The Helm charts of Versatile Data Pipeline (VDP)
Panditas
⭐
1
Data pipelines using Pandas
Nosql Challenge
⭐
1
NU Bootcamp Module 12
Wikihow Data Pipeline
⭐
1
This project aims to collect data from the WikiHow website extract some funny information and give rest API.
Aws Data Pipeline
⭐
1
A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from locally hosted Airflow containers. The end product is a Superset dashboard and a Postgres database, hosted on an EC2 instance at this address (powered down):
Udacity Dataengineer Nanodegree
⭐
1
Learn to design models, build data warehouses and data lakes, automate data pipelines and work with massive datasets.
Data Model And Etl Project
⭐
1
ETL project with data modelling includes python, sql, postgresql and DBMS
Heart Disease Analysis
⭐
1
This project includes dwh development and ml pipeline to predict heart diseases.
Stock Market Real Time Data Pipeline With Apache Kafka And Cassandra
⭐
1
A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apache Kafka and stored in a local Cassandra database.
Related Searches
Python Etl (852)
1-58 of 58 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2023 Awesome Open Source. All rights reserved.