Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python data pipeline
data-pipeline
x
python
x
62 search results found
Airflow
⭐
34,299
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Airbyte
⭐
12,918
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Dagster
⭐
9,467
An orchestration platform for the development, production, and observation of data assets.
Mage Ai
⭐
6,324
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Orchest
⭐
3,876
Build data pipelines, the easy way 🛠️
Whylogs
⭐
2,533
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
Doit
⭐
1,590
task management & automation tool
Mleap
⭐
1,479
MLeap: Deploy ML Pipelines to Production
Meltano
⭐
1,460
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Klio
⭐
822
Smarter data pipelines for audio.
Dataengineeringproject
⭐
644
Example end to end data engineering project.
Covalent
⭐
608
Pythonic tool for running machine-learning/high performance/quantum-computing workflows in heterogeneous environments.
Piperider
⭐
443
Code review for data in dbt
Tributary
⭐
424
Streaming reactive and dataflow graphs in Python
Versatile Data Kit
⭐
389
One framework to develop, deploy and operate data workflows with Python and SQL.
Nonechucks
⭐
315
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Dbt Data Reliability
⭐
304
Data anomalies monitoring as dbt tests and dbt artifacts uploader.
Recap
⭐
292
Work with your web service, database, and streaming schemas in a single format.
Augraphy
⭐
258
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
Gusty
⭐
202
Making DAG construction easier
Flupy
⭐
182
Fluent data pipelines for python and your shell
Pureml
⭐
174
Developer platform for production ML.
Datajoint Python
⭐
158
Relational data pipelines for the science lab
Atom
⭐
137
Automated Tool for Optimized Modelling
Public Datasets Pipelines
⭐
131
Cloud-native, data onboarding architecture for Google Cloud Datasets
Watchmen Matryoshka Doll
⭐
124
Watchmen Platform is a low code data platform for data pipeline, meta data management , analysis, and quality management
Patterns Devkit
⭐
101
Data pipelines from re-usable components
Datajob
⭐
99
Build and deploy a serverless data pipeline on AWS with no effort.
Premier League
⭐
88
A Data Engineering project. Repository for backend infrastructure and Streamlit app files for a Premier League Dashboard.
Tensorpipe
⭐
86
High Performance Tensorflow Data Pipeline with State of Art Augmentations and low level optimizations.
Datacater
⭐
80
The developer-friendly ETL platform for transforming data in real-time. Based on Apache Kafka® and Kubernetes®.
Pansori
⭐
74
Tools for ASR Corpus Generation from Online Video
Beneath
⭐
64
Beneath is a serverless real-time data platform ⚡️
Serverless Data Pipeline Sam
⭐
50
Serverless Data Pipeline powered by Kinesis Firehose, API Gateway, Lambda, S3, and Athena
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Streams Explorer
⭐
43
Explore Apache Kafka data pipelines in Kubernetes.
Ml In Production
⭐
39
The practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.
Conductor Python
⭐
38
Conductor OSS SDK for Python programming language
Datatap Python
⭐
37
Focus on Algorithm Design, Not on Data Wrangling
Stairs
⭐
35
Framework which helps you to make parallel/distributed calculations using data pipelines
Feagen
⭐
33
(deprecated) A fast and memory-efficient Python data engineering framework for machine learning.
Pandas To Postgres
⭐
33
Copy Pandas DataFrames and HDF5 files to PostgreSQL database
Blast
⭐
31
Blast is a data orchestration tool that can run SQL and Python against Google BigQuery and Snowflake. It supports templating with Jinja, data quality tests, query validation, environment management and more.
Cogstack Nifi
⭐
29
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
Debussy_concert
⭐
29
Debussy is an opinionated Data Architecture and Engineering framework, enabling data analysts and engineers to build better platforms and pipelines.
Nostradamus
⭐
29
Backtesting an algorithmic trading strategy using Machine Learning and Sentiment Analysis.
Covalent Slurm Plugin
⭐
26
Executor plugin interfacing Covalent with Slurm
Alto
⭐
26
Alto is a versatile data integration tool that allows you to easily run Singer plugins, build and cache PEX files encapsulating those plugins, and create a data reservoir whereby you can extract once and replay to as many destinations as you want.
Kedro Pandera
⭐
25
A kedro plugin to use pandera in your kedro projects
Network Pipeline
⭐
23
Network traffic data pipeline for real-time predictions and building datasets for deep neural networks
Bruin
⭐
22
Bruin is a data pipeline tool that is designed to be easy-to-use. It allows building data pipelines using SQL and Python, and has built-in data quality checks.
Jobanalytics_and_search
⭐
22
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Arakat
⭐
22
ARAKAT - Big Data Analysis and Business Intelligence Application Development Platform
Spark Movies Etl
⭐
21
Spark data pipeline that ingests and transforms movie ratings data.
Saisoku
⭐
21
Saisoku is a Python module that helps you build complex pipelines of batch file/directory transfer/sync jobs.
Nyt Entity Service
⭐
20
A web service for disambiguating and canonically storing entities.
Neon Workshop
⭐
20
A Pachyderm deep learning tutorial for conference workshops
Udacity Data Eng Proj3
⭐
20
Built a stream processing data pipeline to get data from disparate systems into a dashboard using Kafka as an intermediary.
Rivery_cli
⭐
17
Rivery CLI
Dpex
⭐
17
Distributed DataLoader For Pytorch Based On Ray
Smartpipeline
⭐
16
A framework for rapid development of robust data pipelines following a simple design pattern
Stepist
⭐
16
Framework for data processing
Airflowetl
⭐
16
Blog post on ETL pipelines with Airflow
Opentrials Airflow
⭐
15
Configuration and definitions of Airflow for OpenTrials
Kedro Static Viz
⭐
15
kedro cli plugin for generating a static kedro viz site (html, css, js) that can be deployed on many serverless tools.
Online_store
⭐
15
End to end data engineering project
Airflowdatapipeline
⭐
15
Example of an ETL Pipeline using Airflow
Data Engineering Mta Turnstile
⭐
14
Data Engineering - Metropolitan Transportation Authority (MTA) Subway Data Analysis
Data Pipelines With Airflow
⭐
14
Skooldio: Data Pipelines with Airflow
Marshmallow Pyspark
⭐
12
Marshmallow serializer integration with pyspark
Data Paths
⭐
11
Rpi
⭐
10
RPJiOS: RPJ's RPi OS, a sensor data platform for the Raspberry Pi built with python2.7 and redis.
Greatex
⭐
10
A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in Airflow.
Dagger
⭐
9
Define sophisticated data pipelines with Python and run them on different distributed systems (such as Argo Workflows).
Scribe Data
⭐
9
Wikidata and Wikipedia data extraction for Scribe applications
Pydag
⭐
9
Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag
Tsdat
⭐
9
Time series data utilities for declaratively applying standardization, Q/C, and transformations to datastreams.
Serverless Datapipeline Aws Sam
⭐
8
Pandemic Knowledge
⭐
8
A fully-featured multi-source data pipeline for continuously extracting knowledge from COVID-19 data.
Covalent Ssh Plugin
⭐
8
Executor plugin interfacing Covalent with remote backends using SSH
Cwas
⭐
8
Category-wide association study (CWAS) (Werling et al., 2018; An et al., 2018)
Code First Pipelines
⭐
7
A code-first way to define Ploomber pipelines
Batchout
⭐
7
Framework for building data pipelines
Data Engineer Challenge
⭐
7
Challenge Data Engineer
Final Project End To End Banking Campaign Pipeline
⭐
7
Final Project for IYKRA Data Fellowship 8 Program, creating an end-to-end banking campaign pipeline using lambda architecture (providing acess to batch and stream processing)
Datacrafter
⭐
6
NoSQL extract, transform, load (ETL) toolkit with Python
Airflow4ds
⭐
6
Using Apache Airflow to author, run and monitor complex data pipelines.
Pydwt
⭐
6
Modeling tool like DBT to use SQL Alchemy core with a DataFrame interface like
Tap News
⭐
6
A real-time news scraping and recommendation system
Fhir_connector
⭐
6
Connector that loads FHIR r4 USCDIv3 JSON data from local file storage into the Tuva common data model in Snowflake.
Gtfs Data Pipeline Tfnsw Bus
⭐
6
GTFS Data Pipeline for TfNSW Bus Datasets
Battetl
⭐
6
A module for extracting, transforming, and loading battery cycler data to a database.
Ds001 Scraping To Analysis Extra Store
⭐
5
✨ The current project is a basic process pipeline for extraction, transformation, loading, analysis and presentation. All of this was done using appropriate web scraping, data analysis/presentation and database tools.
Gcp Airflow Foundations
⭐
5
Opinionated framework based on Airflow 2.0 for building pipelines to ingest data into a BigQuery data warehouse
Final Project Level3 Cv 01
⭐
5
HEY-I (HElp Your Interview)
Log Aggregator
⭐
5
Log aggregation pipeline with kafka and ELK stack
Codepack
⭐
5
CodePack - A Python package to easily make, run, and manage workflows
Chariots
⭐
5
versioned machine learning pipelines
Stock Market Real Time Data Pipeline With Apache Kafka And Cassandra
⭐
5
A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apache Kafka and stored in a local Cassandra database.
Jobs
⭐
5
Job openings at Quod AI
Related Searches
Python Machine Learning (20,195)
Python Jupyter Notebook (17,496)
Python Dataset (14,792)
Python Docker (14,113)
Python Tensorflow (13,736)
Python Deep Learning (13,092)
Python Html (10,924)
Python Artificial Intelligence (8,580)
Python Pytorch (7,877)
Python Amazon Web Services (7,633)
1-62 of 62 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.