Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for data pipeline
data-pipeline
x
212 search results found
Cogstack Nifi
⭐
29
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
Nostradamus
⭐
29
Backtesting an algorithmic trading strategy using Machine Learning and Sentiment Analysis.
Awesome Data Engineering
⭐
29
📒(GitBook) A curated list of awesome Data Engineering resources
Nebula Exchange
⭐
26
NebulaGraph Exchange is an Apache Spark application to parse data from different sources to NebulaGraph in a distributed environment. It supports both batch and streaming data in various formats and sources including other Graph Databases, RDBMS, Data warehouses, NoSQL, Message Bus, File systems, etc.
Covalent Slurm Plugin
⭐
26
Executor plugin interfacing Covalent with Slurm
Alto
⭐
26
Alto is a versatile data integration tool that allows you to easily run Singer plugins, build and cache PEX files encapsulating those plugins, and create a data reservoir whereby you can extract once and replay to as many destinations as you want.
Kedro Pandera
⭐
25
A kedro plugin to use pandera in your kedro projects
Scala Datapipeline Dsl
⭐
25
Domain-specific language to help build and maintain AWS Data Pipelines
Network Pipeline
⭐
23
Network traffic data pipeline for real-time predictions and building datasets for deep neural networks
Ordered Concurrently
⭐
22
Ordered-concurrently a library for concurrent processing with ordered output in Go. Process work concurrently and returns output in a channel in the order of input. It is useful in concurrently processing items in a queue, and get output in the order provided by the queue.
Bruin
⭐
22
Bruin is a data pipeline tool that is designed to be easy-to-use. It allows building data pipelines using SQL and Python, and has built-in data quality checks.
Jobanalytics_and_search
⭐
22
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Bsp Geth
⭐
22
Ethereum client written in Go, modified for full-hierarchy data exports and block specimen production
Arakat
⭐
22
ARAKAT - Big Data Analysis and Business Intelligence Application Development Platform
Saisoku
⭐
21
Saisoku is a Python module that helps you build complex pipelines of batch file/directory transfer/sync jobs.
Spark Movies Etl
⭐
21
Spark data pipeline that ingests and transforms movie ratings data.
Barnard59
⭐
20
An intuitive and flexible RDF pipeline solution designed to simplify and automate ETL processes for efficient data management.
Neon Workshop
⭐
20
A Pachyderm deep learning tutorial for conference workshops
Xvc
⭐
20
A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)
Console
⭐
20
⛅ Versatile Data Pipeline (VDP) console website
Pramen
⭐
20
Resilient data pipeline framework running on Apache Spark
Nyt Entity Service
⭐
20
A web service for disambiguating and canonically storing entities.
Sparkplug
⭐
20
Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌
Udacity Data Eng Proj3
⭐
20
Built a stream processing data pipeline to get data from disparate systems into a dashboard using Kafka as an intermediary.
Enrich
⭐
20
Snowplow Enrichment jobs and library
Slurpee
⭐
19
A GUI frontend to manage blockchain ingestion with slurp
Data Pipeline Project
⭐
18
Data pipeline project
Kahpp Oss
⭐
18
Kafka Streams made easy with a YAML file
Rivery_cli
⭐
17
Rivery CLI
Cli
⭐
17
📺 Instill AI's official command line tool
Dpex
⭐
17
Distributed DataLoader For Pytorch Based On Ray
Airflowetl
⭐
16
Blog post on ETL pipelines with Airflow
Aws Data Pipeline Developer Guide
⭐
16
The open source version of the AWS Data Pipeline documentation. To provide feedback & requests for changes, submit issues in this repository, or make proposed changes & submit a pull request.
Smartpipeline
⭐
16
A framework for rapid development of robust data pipelines following a simple design pattern
Stepist
⭐
16
Framework for data processing
Kedro Static Viz
⭐
15
kedro cli plugin for generating a static kedro viz site (html, css, js) that can be deployed on many serverless tools.
Airflowdatapipeline
⭐
15
Example of an ETL Pipeline using Airflow
Opentrials Airflow
⭐
15
Configuration and definitions of Airflow for OpenTrials
Online_store
⭐
15
End to end data engineering project
Awesome Apache Pulsar
⭐
14
A curated list of resources about Apache Pulsar.
Data Pipelines With Airflow
⭐
14
Skooldio: Data Pipelines with Airflow
Data Engineering Mta Turnstile
⭐
14
Data Engineering - Metropolitan Transportation Authority (MTA) Subway Data Analysis
Pippin
⭐
13
Go library to create and manage data pipelines on your machine
Andyx
⭐
13
Buildersoft Andy X Project
Richflow
⭐
13
A Node.js and JavaScript synchronous data pipeline processing, data sharing and stream processing library. Actionable & Transformable Pipeline data processing.
Fsharp Data Processing Pipeline
⭐
13
Provides an extensible solution for creating Data Processing Pipelines in F#.
Medicare_cclf_connector
⭐
12
This connector is a dbt project that maps Medicare CCLF claims data to the Tuva Input Layer.
Marshmallow Pyspark
⭐
12
Marshmallow serializer integration with pyspark
Data Toolkit
⭐
11
Data Pipeline Toolkit for Early-Stage Startups
Data Paths
⭐
11
Rpi
⭐
10
RPJiOS: RPJ's RPi OS, a sensor data platform for the Raspberry Pi built with python2.7 and redis.
Fake Data Pipeline
⭐
10
Data Generators -> Kafka -> Spark Streaming -> PostgreSQL -> Grafana
Greatex
⭐
10
A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in Airflow.
Gfw_forest_loss_geotrellis
⭐
10
Global Tree Cover Loss Analysis using Geotrellis and SPARK
Dataquest_eng
⭐
10
Here's how to get DataQuest's Data Engineering Track missions' content to work on your localhost. Using data from my Valenbisi ARIMA modeling project, I document my steps using PostgreSQL, Postico, and the Command Line to get our DataQuest exercises running out of a Jupyter Notebook.
Airflow Kubernetes
⭐
9
Simple Airflow on Kubernetes (GKE)
Tuva_demo
⭐
9
A starter dbt project and synthetic claims dataset for trying out the Tuva Project.
Tsdat
⭐
9
Time series data utilities for declaratively applying standardization, Q/C, and transformations to datastreams.
Dagger
⭐
9
Define sophisticated data pipelines with Python and run them on different distributed systems (such as Argo Workflows).
Pydag
⭐
9
Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag
Stream Accumulator
⭐
9
Accumulate all the data flowing through a stream and emit it as a single chunk or as a promise
Twitter Pipeline
⭐
9
In this project, you will be building a Twitter Scheduler using Apache Airflow on Docker.
Dbt Documentor
⭐
9
✍️ dbt doc generator for advanced data teams
Snowflake Json Datapipeline
⭐
9
Building Json data pipeline within Snowflake using Streams and Tasks
Scribe Data
⭐
9
Wikidata and Wikipedia data extraction for Scribe applications
Serverless Datapipeline Aws Sam
⭐
8
Covalent Ssh Plugin
⭐
8
Executor plugin interfacing Covalent with remote backends using SSH
Pandemic Knowledge
⭐
8
A fully-featured multi-source data pipeline for continuously extracting knowledge from COVID-19 data.
Teleporter
⭐
8
Reactive Streams distributed datapipeline for data process. Now support kafka,jdbc,kudu,elasticsearch,hdfs.etc
Kulay
⭐
8
High speed message passing between various queues and services
Cwas
⭐
8
Category-wide association study (CWAS) (Werling et al., 2018; An et al., 2018)
Terraform Provider Montecarlo
⭐
7
This open-source Terraform provider enables users to seamlessly integrate the Monte Carlo data reliabillity platform into their infrastructure as a code (IaC) workflows.
Code First Pipelines
⭐
7
A code-first way to define Ploomber pipelines
Batchout
⭐
7
Framework for building data pipelines
Data Engineer Challenge
⭐
7
Challenge Data Engineer
Final Project End To End Banking Campaign Pipeline
⭐
7
Final Project for IYKRA Data Fellowship 8 Program, creating an end-to-end banking campaign pipeline using lambda architecture (providing acess to batch and stream processing)
Medicare_lds_connector
⭐
7
Maps Medicare LDS claims data to the Tuva Input Layer so you can easily run the Tuva Project.
Didact Ui
⭐
7
The VueJS, Flowbite-powered single-page app dashboard for the Didact Platform.
Kolle
⭐
7
Business model representation automation
Community
⭐
6
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Examples
⭐
6
Best practices for data workflows, integrations with the Modern Data Stack (MDS), Infrastructure as Code (IaC), Cloud Provider Services
Battetl
⭐
6
A module for extracting, transforming, and loading battery cycler data to a database.
Fhir_connector
⭐
6
Connector that loads FHIR r4 USCDIv3 JSON data from local file storage into the Tuva common data model in Snowflake.
Kanjiapp
⭐
6
Go Tfdata
⭐
6
Go library that provides easy-to-use interfaces and tools for TensorFlow users, in particular allowing to train existing TF models on .tar and .tgz datasets
Airflow4ds
⭐
6
Using Apache Airflow to author, run and monitor complex data pipelines.
Gtfs Data Pipeline Tfnsw Bus
⭐
6
GTFS Data Pipeline for TfNSW Bus Datasets
Pydwt
⭐
6
Modeling tool like DBT to use SQL Alchemy core with a DataFrame interface like
Opensnowcat Collector
⭐
6
OpenSnowcat Collector, an open source fork of Snowplow (Apache 2.0 License)
Cargo
⭐
6
Blockchain sync for the cult of Cardano
Awesome Data Pipeline
⭐
6
Awesome list for datapipeline
Datacrafter
⭐
6
NoSQL extract, transform, load (ETL) toolkit with Python
Cribl Knowledge Pack
⭐
6
Examples of best-in-class use cases curated from community members and Cribl Solutions Engineers.
Cli
⭐
6
This is the STRM Privacy Command Line Interface, to define and manage your privacy streams, data schemas, event contracts and much more.
Cribl Syslog Input
⭐
6
This Pack enables a variety of functions when LogStream is used to receive data from Syslog senders.
Tap News
⭐
6
A real-time news scraping and recommendation system
Jobs
⭐
5
Job openings at Quod AI
Codepack
⭐
5
CodePack - A Python package to easily make, run, and manage workflows
Final Project Level3 Cv 01
⭐
5
HEY-I (HElp Your Interview)
Oh My Github Pipeline
⭐
5
🔄 A flexible open-source data pipeline for seamlessly syncing data from any github user to your database.
101-200 of 212 search results
< Previous
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.