Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for pipeline data engineering
data-engineering
x
pipeline
x
69 search results found
Prefect
⭐
12,087
The easiest way to orchestrate and observe your data pipelines
Airbyte
⭐
10,828
Data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes.
Great_expectations
⭐
8,428
Always know what to expect from your data.
Dagster
⭐
7,585
An orchestration platform for the development, production, and observation of data assets.
Mage Ai
⭐
4,790
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Kestra
⭐
3,484
Kestra is an infinitely scalable orchestration and scheduling platform, creating, running, scheduling, and monitoring millions of complex pipelines.
Ploomber
⭐
3,083
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
Meltano
⭐
982
Extract & Load with joy — CLI & version control for ELT without limitations. No more black box. Let your creativity flow.
Mlrun
⭐
981
Machine Learning automation and tracking
Goodreads_etl_pipeline
⭐
593
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Versatile Data Kit
⭐
338
Build, run and manage your data pipelines with Python or SQL on any cloud
Data Engineering With Python
⭐
302
Data Engineering with Python, published by Packt
Yuniql
⭐
292
Free and open source schema versioning and database migration made natively with .NET/6. NEW THIS MAY 2022! v1.3.15 released!
Taipy
⭐
267
Turns Data and AI algorithms into full web applications in no time.
Cuelake
⭐
266
Use SQL to build ELT pipelines on a data lakehouse.
Butterfree
⭐
249
A tool for building feature stores.
Pipelinex
⭐
198
PipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Pureml
⭐
174
Developer platform for production ML.
Setl
⭐
169
A simple Spark-powered ETL framework that just works 🍺
Procfwk
⭐
140
A cross tenant metadata driven processing framework for Azure Data Factory and Azure Synapse Analytics achieved by coupling orchestration pipelines with a SQL database and a set of Azure Functions.
Dataplane
⭐
129
Dataplane is an Airflow inspired data platform with additional data mesh capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.
Patterns Devkit
⭐
101
Data pipelines from re-usable components
Dataflow Ops
⭐
81
Project demonstrating how to automate Prefect 2.0 deployments to AWS ECS Fargate
Data Engineering Nanodegree
⭐
76
Projects done in the Data Engineering Nanodegree by Udacity.com
Prefect Deployment Patterns
⭐
48
Code examples showing flow deployment to various types of infrastructure
Drivers
⭐
46
🏎 The python library enabling access to tools and data sources in minutes, with Naas low-code formulas.
Prodmodel
⭐
39
Build, test, deploy, iterate - Dev and prod tool for data science pipelines
Ml In Production
⭐
39
The practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.
Preprocessy
⭐
36
Python package for Customizable Data Preprocessing Pipelines
Stairs
⭐
35
Framework which helps you to make parallel/distributed calculations using data pipelines
Awesome Data Engineering
⭐
21
📒(GitBook) A curated list of awesome Data Engineering resources
Neon Workshop
⭐
20
A Pachyderm deep learning tutorial for conference workshops
Spark Movies Etl
⭐
20
Spark data pipeline that ingests and transforms movie ratings data.
Bulker
⭐
19
Service for bulk-loading data to databases with automatic schema management (Redshift, Snowflake, BigQuery, ClickHouse, Postgres, MySQL)
Serpytor
⭐
18
A distributed, low-code, end-to-end data collection and analysis tool for data folks. Take the pain out of data collection from your pipeline!
Airflowetl
⭐
16
Blog post on ETL pipelines with Airflow
Kedro Static Viz
⭐
15
kedro cli plugin for generating a static kedro viz site (html, css, js) that can be deployed on many serverless tools.
Airflowdatapipeline
⭐
15
Example of an ETL Pipeline using Airflow
Classifybot
⭐
15
Automate building ML classification pipelines in .NET
Benthos Captain
⭐
14
A Kubernetes Operator to orchestrate Benthos pipelines
Verified Sources
⭐
11
Contribute to dlt verified sources 🔥
Moroccanhousing Etl
⭐
11
Moroccan housing data pipeline using scrapy, mongodb , zyte and digitalocean cloud
Business_closures_de_pipeline
⭐
10
Data Engineering pipeline hosted entirely in the AWS ecosystem utilizing DocumentDB as the database
Greatex
⭐
10
A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in Airflow.
Wafers Fault Detection
⭐
9
End to end machine learning project to detects fault in the wafers based on sensor data
Prism
⭐
8
Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python.
Docker Streamsets Kafka Minecraft
⭐
7
Visualize Apache logs in Minecraft using Docker, Streamsets Data Collector, Spigot and Kafka .
Realtime Market Data Pipeline
⭐
6
A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.
Weather_pipeline
⭐
5
This set of code and instructions has the porpouse to instanciate a compiled environment with set of docker images like airflow webserver, airflow scheduler, postgresql, pyspark, Data Pipeline consuming data from weather api , processing with pyspark and storing in postgresql
Tuberia
⭐
4
Data engineering meets software engineering
Datatools
⭐
4
DataTools in Python
Tempus_de_challenge
⭐
4
A solution to the Tempus Data Engineer challenge
Adc Pipeline
⭐
4
A pipeline for a structured way of working
Reading Resources
⭐
4
software engineering reading resources
Alyeska
⭐
3
Alyeska /al-ee-EHS-kah/ n. A Data Pipeline Toolkit
Smol Elt
⭐
3
a smol elt (not etl) pipeline for smol tasks
Benthos Captain
⭐
3
A Kubernetes Operator to orchestrate Benthos pipelines
Primadatapipeline
⭐
3
La mia prima data pipeline talk demo project
Big_data_learning
⭐
3
big data
Didact Engine
⭐
3
The decoupled, atomic REST API and orchestration engine for the Didact .NET Standard scheduler.
Data_check
⭐
2
data_check is a simple data validation tool
Gcp Stock Data Pipeline
⭐
2
This project fetches crypto/stock data through Tiingo API, processes it using Dataproc(PySpark), and stores it in BigQuery.
Content Dataengineering Monday
⭐
2
Collect all the resources posted in data engineering Monday series
Short Term Rentals Warehouse
⭐
2
Pipeline, warehouse, and visualization tools for investigating the impact of Airbnb short-term rentals on world cities.
Open Fda Data Pipeline
⭐
2
A repeatable data pipeline to extract data from open.fda.gov, transform the data, and make the data available for advanced analytics in Amazon Web Services (AWS).
Data Zero To Cloud
⭐
2
Slides and code for my talk 'Data pipelines. From zero to cloud scale'
Sapient Data Engineer Challenge
⭐
2
Created a data pipeline to stream data and generate real-time alerts using NiFi, Kafka and Spark
Engineering_project
⭐
2
A real-time dashboard for Virginia COVID-19 case reports and vaccinations with end-to-end data collection, storage, and processing pipeline
Kedro Action
⭐
2
A GitHub Action to lint, test, build-docs, package, and run your kedro pipelines. Supports any Python version you'll give it (that is also supported by pyenv).
Related Searches
Python Pipeline (4,199)
Javascript Pipeline (1,365)
Pipeline Jenkins (1,145)
Shell Pipeline (1,143)
Docker Pipeline (1,018)
Jupyter Pipeline (976)
Jupyter Notebook Pipeline (944)
Java Pipeline (874)
Azure Pipeline (654)
1-69 of 69 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2023 Awesome Open Source. All rights reserved.