Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python data pipeline
data-pipeline
x
python
x
125 search results found
Dagster
⭐
7,527
An orchestration platform for the development, production, and observation of data assets.
Mage Ai
⭐
4,722
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Orchest
⭐
3,867
Build data pipelines, the easy way 🛠️
Whylogs
⭐
2,233
The open standard for data logging
Doit
⭐
1,590
task management & automation tool
Mleap
⭐
1,454
MLeap: Deploy ML Pipelines to Production
Meltano
⭐
967
Extract & Load with joy — CLI & version control for ELT without limitations. No more black box. Let your creativity flow.
Klio
⭐
700
Smarter data pipelines for audio.
Dataengineeringproject
⭐
644
Example end to end data engineering project.
Covalent
⭐
454
Pythonic tool for running machine-learning/high performance/quantum-computing workflows in heterogenous environments.
Tributary
⭐
407
Streaming reactive and dataflow graphs in Python
Piperider
⭐
342
Code review for data in dbt
Versatile Data Kit
⭐
336
Build, run and manage your data pipelines with Python or SQL on any cloud
Nonechucks
⭐
315
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Recap
⭐
239
Recap tracks and transform schemas across your whole application.
Dbt Data Reliability
⭐
196
Data anomalies monitoring as dbt tests and dbt artifacts uploader.
Augraphy
⭐
188
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
Flupy
⭐
177
Fluent data pipelines for python and your shell
Pureml
⭐
174
Developer platform for production ML.
Datajoint Python
⭐
147
Relational data pipelines for the science lab
Gusty
⭐
142
Making DAG construction easier
Atom
⭐
126
Automated Tool for Optimized Modelling
Watchmen Matryoshka Doll
⭐
124
Watchmen Platform is a low code data platform for data pipeline, meta data management , analysis, and quality management
Public Datasets Pipelines
⭐
121
Cloud-native, data onboarding architecture for Google Cloud Datasets
Patterns Devkit
⭐
101
Data pipelines from re-usable components
Datajob
⭐
99
Build and deploy a serverless data pipeline on AWS with no effort.
Tensorpipe
⭐
86
High Performance Tensorflow Data Pipeline with State of Art Augmentations and low level optimizations.
Datacater
⭐
77
The developer-friendly ETL platform for transforming data in real-time. Based on Apache Kafka® and Kubernetes®.
Pansori
⭐
74
Tools for ASR Corpus Generation from Online Video
Beneath
⭐
64
Beneath is a serverless real-time data platform ⚡️
Serverless Data Pipeline Sam
⭐
50
Serverless Data Pipeline powered by Kinesis Firehose, API Gateway, Lambda, S3, and Athena
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Pipeline
⭐
40
OONI data processing pipeline
Streams Explorer
⭐
39
Explore Apache Kafka data pipelines in Kubernetes.
Ml In Production
⭐
39
The practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.
Datatap Python
⭐
37
Focus on Algorithm Design, Not on Data Wrangling
Stairs
⭐
35
Framework which helps you to make parallel/distributed calculations using data pipelines
Feagen
⭐
33
(deprecated) A fast and memory-efficient Python data engineering framework for machine learning.
Pandas To Postgres
⭐
33
Copy Pandas DataFrames and HDF5 files to PostgreSQL database
Blast
⭐
31
Blast is a data orchestration tool that can run SQL and Python against Google BigQuery and Snowflake. It supports templating with Jinja, data quality tests, query validation, environment management and more.
Nostradamus
⭐
29
Backtesting an algorithmic trading strategy using Machine Learning and Sentiment Analysis.
Debussy_concert
⭐
29
Debussy is an opinionated Data Architecture and Engineering framework, enabling data analysts and engineers to build better platforms and pipelines.
Alto
⭐
26
Alto is a versatile data integration tool that allows you to easily run Singer plugins, build and cache PEX files encapsulating those plugins, and create a data reservoir whereby you can extract once and replay to as many destinations as you want.
Cogstack Nifi
⭐
25
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
Network Pipeline
⭐
23
Network traffic data pipeline for real-time predictions and building datasets for deep neural networks
Covalent Slurm Plugin
⭐
22
Executor plugin interfacing Covalent with Slurm
Jobanalytics_and_search
⭐
22
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Arakat
⭐
22
ARAKAT - Big Data Analysis and Business Intelligence Application Development Platform
Saisoku
⭐
21
Saisoku is a Python module that helps you build complex pipelines of batch file/directory transfer/sync jobs.
Nyt Entity Service
⭐
20
A web service for disambiguating and canonically storing entities.
Udacity Data Eng Proj3
⭐
20
Built a stream processing data pipeline to get data from disparate systems into a dashboard using Kafka as an intermediary.
Neon Workshop
⭐
20
A Pachyderm deep learning tutorial for conference workshops
Spark Movies Etl
⭐
20
Spark data pipeline that ingests and transforms movie ratings data.
Premier League
⭐
18
A Data Engineering project. Repository for backend infrastructure and Streamlit app files for a Premier League Dashboard.
Dpex
⭐
17
Distributed DataLoader For Pytorch Based On Ray
Airflowetl
⭐
16
Blog post on ETL pipelines with Airflow
Rivery_cli
⭐
16
Rivery CLI
Stepist
⭐
16
Framework for data processing
Airflowdatapipeline
⭐
15
Example of an ETL Pipeline using Airflow
Kedro Static Viz
⭐
15
kedro cli plugin for generating a static kedro viz site (html, css, js) that can be deployed on many serverless tools.
Opentrials Airflow
⭐
15
Configuration and definitions of Airflow for OpenTrials
Online_store
⭐
15
End to end data engineering project
Data Paths
⭐
11
Rpi
⭐
10
RPJiOS: RPJ's RPi OS, a sensor data platform for the Raspberry Pi built with python2.7 and redis.
Marshmallow Pyspark
⭐
10
Marshmallow serializer integration with pyspark
Greatex
⭐
10
A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in Airflow.
Pydag
⭐
9
Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag
Dagger
⭐
9
Define sophisticated data pipelines with Python and run them on different distributed systems (such as Argo Workflows).
Smartpipeline
⭐
8
A framework for rapid development of robust data pipelines following a simple design pattern
Pandemic Knowledge
⭐
8
A fully-featured multi-source data pipeline for continuously extracting knowledge from COVID-19 data.
Serverless Datapipeline Aws Sam
⭐
8
Tsdat
⭐
8
Time series data utilities for declaratively applying standardization, Q/C, and transformations to datastreams.
Batchout
⭐
7
Framework for building data pipelines
Data Engineering Mta Turnstile
⭐
7
Data Engineering - Metropolitan Transportation Authority (MTA) Subway Data Analysis
Final Project End To End Banking Campaign Pipeline
⭐
7
Final Project for IYKRA Data Fellowship 8 Program, creating an end-to-end banking campaign pipeline using lambda architecture (providing acess to batch and stream processing)
Data Engineer Challenge
⭐
7
Challenge Data Engineer
Code First Pipelines
⭐
7
A code-first way to define Ploomber pipelines
Data Pipelines With Airflow
⭐
6
Skooldio: Data Pipelines with Airflow
Airflow4ds
⭐
6
Using Apache Airflow to author, run and monitor complex data pipelines.
Scribe Data
⭐
6
Wikidata and Wikipedia data extraction for Scribe applications
Pydwt
⭐
6
Modeling tool like DBT to use SQL Alchemy core with a DataFrame interface like
Datacrafter
⭐
6
NoSQL extract, transform, load (ETL) toolkit with Python
Tap News
⭐
6
A real-time news scraping and recommendation system
Gtfs Data Pipeline Tfnsw Bus
⭐
6
GTFS Data Pipeline for TfNSW Bus Datasets
Udacity Data Engineering Nanodegree
⭐
5
This is a repository to hold the files and notebooks produced throughout my Udacity's Nanodegree Data Engineering program.
Gcp Airflow Foundations
⭐
5
Opinionated framework based on Airflow 2.0 for building pipelines to ingest data into a BigQuery data warehouse
Codepack
⭐
5
CodePack - A Python package to easily make, run, and manage workflows
Chariots
⭐
5
versioned machine learning pipelines
Cwas
⭐
5
Category-wide association study (CWAS) (Werling et al., 2018; An et al., 2018)
Jobs
⭐
5
Job openings at Quod AI
Final Project Level3 Cv 01
⭐
5
HEY-I (HElp Your Interview)
Nyt Entity Uploader
⭐
4
A Python wrapper for making requests to the NYT Entity Service API
Dattasa
⭐
4
python project for dattasa package in pypi
Today News
⭐
4
Comprehensive news applications aggregated from various sources with personal recommendation system implemented in Service Oriented Architecture.
Lenna
⭐
4
Latency Estimation for Neural Network Architecture
Pach Neon
⭐
4
An example Pachyderm ML pipeline using Nervana Neon
Divvy Bikeshare De Project
⭐
4
An end-to-end data pipeline which extracts divvy bikeshare data from web loads it into data lake and datawarehouse transforms it using dbt and finally , a dashboard to visualize the data using looker studio, the pipeline is orchestrated using prefect
Dataflow Cookiecutter
⭐
4
Create production-ready Dataflow projects in a zap! ⚡️
Nlp Datasets
⭐
4
A dataset utils repository based on tf.data API.
Gfw_pixetl
⭐
4
GFW ETL for raster tiles
Related Searches
Python Python3 (857,414)
Python Ml (20,195)
Python Jupyter (17,496)
Python Flask (15,633)
Python Dataset (14,792)
Python Pytorch (14,667)
Python Docker (14,113)
Python Machine Learning (14,099)
Python Tensorflow (13,736)
Python Deep Learning (13,092)
1-100 of 125 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2023 Awesome Open Source. All rights reserved.