Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for pipeline data engineering
data-engineering
x
pipeline
x
37 search results found
Prefect
⭐
14,603
Prefect is a workflow orchestration tool empowering developers to build, observe, and react to data pipelines
Argo Workflows
⭐
14,264
Workflow Engine for Kubernetes
Airbyte
⭐
12,918
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Dagster
⭐
9,467
An orchestration platform for the development, production, and observation of data assets.
Great_expectations
⭐
9,179
Always know what to expect from your data.
Mage Ai
⭐
6,324
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Kestra
⭐
5,257
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
Taipy
⭐
4,311
Turns Data and AI algorithms into production-ready web applications in no time.
Ploomber
⭐
3,318
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
Meltano
⭐
1,460
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Mlrun
⭐
1,177
Machine Learning automation and tracking
Neumai
⭐
693
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
Goodreads_etl_pipeline
⭐
593
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Versatile Data Kit
⭐
389
One framework to develop, deploy and operate data workflows with Python and SQL.
Data Engineering With Python
⭐
302
Data Engineering with Python, published by Packt
Dataengineering Roadmap
⭐
297
Un repositorio más con conceptos básicos, desafíos técnicos y recursos sobre ingeniería de datos en español 🧙✨
Yuniql
⭐
292
Free and open source schema versioning and database migration made natively with .NET/6. NEW THIS MAY 2022! v1.3.15 released!
Butterfree
⭐
269
A tool for building feature stores.
Cuelake
⭐
266
Use SQL to build ELT pipelines on a data lakehouse.
Pipelinex
⭐
212
PipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Pureml
⭐
174
Developer platform for production ML.
Setl
⭐
173
A simple Spark-powered ETL framework that just works 🍺
Dataplane
⭐
171
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.
Procfwk
⭐
140
A cross tenant metadata driven processing framework for Azure Data Factory and Azure Synapse Analytics achieved by coupling orchestration pipelines with a SQL database and a set of Azure Functions.
Patterns Devkit
⭐
101
Data pipelines from re-usable components
Dataflow Ops
⭐
97
Project demonstrating how to automate Prefect 2.0 deployments to AWS ECS Fargate
Bulker
⭐
92
Service for bulk-loading data to databases with automatic schema management (Redshift, Snowflake, BigQuery, ClickHouse, Postgres, MySQL)
Airbyte_serverless
⭐
83
Airbyte made simple (no UI, no database, no cluster)
Data Engineering Nanodegree
⭐
76
Projects done in the Data Engineering Nanodegree by Udacity.com
Prism
⭐
70
Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python.
Drivers
⭐
53
Low-code Python library enabling access to APIs, tools, data sources in seconds.
Prefect Deployment Patterns
⭐
48
Code examples showing flow deployment to various types of infrastructure
Prodmodel
⭐
39
Build, test, deploy, iterate - Dev and prod tool for data science pipelines
Ml In Production
⭐
39
The practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.
Preprocessy
⭐
36
Python package for Customizable Data Preprocessing Pipelines
Stairs
⭐
35
Framework which helps you to make parallel/distributed calculations using data pipelines
Didact Engine
⭐
34
The REST API and execution engine for the Didact Platform.
Spark Ai
⭐
31
Toolbox for building Generative AI applications on top of Apache Spark.
Verified Sources
⭐
31
Contribute to dlt verified sources 🔥
Benthos Captain
⭐
30
A Kubernetes Operator to orchestrate Benthos pipelines
Awesome Data Engineering
⭐
29
📒(GitBook) A curated list of awesome Data Engineering resources
Spark Movies Etl
⭐
21
Spark data pipeline that ingests and transforms movie ratings data.
Neon Workshop
⭐
20
A Pachyderm deep learning tutorial for conference workshops
Panda_patrol
⭐
18
Serpytor
⭐
18
A distributed, low-code, end-to-end data collection and analysis tool for data folks. Take the pain out of data collection from your pipeline!
Airflowetl
⭐
16
Blog post on ETL pipelines with Airflow
Airflowdatapipeline
⭐
15
Example of an ETL Pipeline using Airflow
Classifybot
⭐
15
Automate building ML classification pipelines in .NET
Kedro Static Viz
⭐
15
kedro cli plugin for generating a static kedro viz site (html, css, js) that can be deployed on many serverless tools.
Thepipelinetool
⭐
13
A pipeline orchestration tool
Pippin
⭐
13
Go library to create and manage data pipelines on your machine
Wafers Fault Detection
⭐
12
End to end machine learning project to detects fault in the wafers based on sensor data
Airflow Rbac Roles Cli
⭐
12
A tool to create Airflow RBAC roles with dag-level permissions from cli.
Moroccanhousing Etl
⭐
11
Moroccan housing data pipeline using scrapy, mongodb , zyte and digitalocean cloud
Greatex
⭐
10
A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in Airflow.
Business_closures_de_pipeline
⭐
10
Data Engineering pipeline hosted entirely in the AWS ecosystem utilizing DocumentDB as the database
Airflow
⭐
8
This set of code and instructions has the porpouse to instanciate a compiled environment with set of docker images like airflow webserver, airflow scheduler, postgresql, pyspark, Data Pipeline consuming data from weather api , processing with pyspark and storing in postgresql
Docker Streamsets Kafka Minecraft
⭐
7
Visualize Apache logs in Minecraft using Docker, Streamsets Data Collector, Spigot and Kafka .
Didact Ui
⭐
7
The VueJS, Flowbite-powered single-page app dashboard for the Didact Platform.
Realtime Market Data Pipeline
⭐
6
A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.
Stock Market Real Time Data Pipeline With Apache Kafka And Cassandra
⭐
5
A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apache Kafka and stored in a local Cassandra database.
Reading Resources
⭐
5
software engineering reading resources
Related Searches
Python Pipeline (4,255)
Javascript Pipeline (1,369)
Pipeline Jenkins (1,150)
Shell Pipeline (1,143)
Docker Pipeline (1,018)
Jupyter Notebook Pipeline (976)
Java Pipeline (868)
Golang Pipeline (682)
1-37 of 37 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.