Awesome Open Source

Programming Languages

Search results for data pipeline

data-pipeline x

212 search results found

Cogstack Nifi ⭐ 29

Building data processing pipelines for documents processing with NLP using Apache NiFi and related services

Nostradamus ⭐ 29

Backtesting an algorithmic trading strategy using Machine Learning and Sentiment Analysis.

Awesome Data Engineering ⭐ 29

📒(GitBook) A curated list of awesome Data Engineering resources

Nebula Exchange ⭐ 26

NebulaGraph Exchange is an Apache Spark application to parse data from different sources to NebulaGraph in a distributed environment. It supports both batch and streaming data in various formats and sources including other Graph Databases, RDBMS, Data warehouses, NoSQL, Message Bus, File systems, etc.

Covalent Slurm Plugin ⭐ 26

Executor plugin interfacing Covalent with Slurm

Alto is a versatile data integration tool that allows you to easily run Singer plugins, build and cache PEX files encapsulating those plugins, and create a data reservoir whereby you can extract once and replay to as many destinations as you want.

Kedro Pandera ⭐ 25

A kedro plugin to use pandera in your kedro projects

Scala Datapipeline Dsl ⭐ 25

Domain-specific language to help build and maintain AWS Data Pipelines

Network Pipeline ⭐ 23

Network traffic data pipeline for real-time predictions and building datasets for deep neural networks

Ordered Concurrently ⭐ 22

Ordered-concurrently a library for concurrent processing with ordered output in Go. Process work concurrently and returns output in a channel in the order of input. It is useful in concurrently processing items in a queue, and get output in the order provided by the queue.

Bruin is a data pipeline tool that is designed to be easy-to-use. It allows building data pipelines using SQL and Python, and has built-in data quality checks.

Jobanalytics_and_search ⭐ 22

JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.

Bsp Geth ⭐ 22

Ethereum client written in Go, modified for full-hierarchy data exports and block specimen production

ARAKAT - Big Data Analysis and Business Intelligence Application Development Platform

Saisoku is a Python module that helps you build complex pipelines of batch file/directory transfer/sync jobs.

Spark Movies Etl ⭐ 21

Spark data pipeline that ingests and transforms movie ratings data.

Barnard59 ⭐ 20

An intuitive and flexible RDF pipeline solution designed to simplify and automate ETL processes for efficient data management.

Neon Workshop ⭐ 20

A Pachyderm deep learning tutorial for conference workshops

A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)

⛅ Versatile Data Pipeline (VDP) console website

Resilient data pipeline framework running on Apache Spark

Nyt Entity Service ⭐ 20

A web service for disambiguating and canonically storing entities.

Sparkplug ⭐ 20

Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌

Udacity Data Eng Proj3 ⭐ 20

Built a stream processing data pipeline to get data from disparate systems into a dashboard using Kafka as an intermediary.

Snowplow Enrichment jobs and library

A GUI frontend to manage blockchain ingestion with slurp

Data Pipeline Project ⭐ 18

Data pipeline project

Kahpp Oss ⭐ 18

Kafka Streams made easy with a YAML file

Rivery_cli ⭐ 17

📺 Instill AI's official command line tool

Distributed DataLoader For Pytorch Based On Ray

Airflowetl ⭐ 16

Blog post on ETL pipelines with Airflow

Aws Data Pipeline Developer Guide ⭐ 16

The open source version of the AWS Data Pipeline documentation. To provide feedback & requests for changes, submit issues in this repository, or make proposed changes & submit a pull request.

Smartpipeline ⭐ 16

A framework for rapid development of robust data pipelines following a simple design pattern

Framework for data processing

Kedro Static Viz ⭐ 15

kedro cli plugin for generating a static kedro viz site (html, css, js) that can be deployed on many serverless tools.

Airflowdatapipeline ⭐ 15

Example of an ETL Pipeline using Airflow

Opentrials Airflow ⭐ 15

Configuration and definitions of Airflow for OpenTrials

Online_store ⭐ 15

End to end data engineering project

Awesome Apache Pulsar ⭐ 14

A curated list of resources about Apache Pulsar.

Data Pipelines With Airflow ⭐ 14

Skooldio: Data Pipelines with Airflow

Data Engineering Mta Turnstile ⭐ 14

Data Engineering - Metropolitan Transportation Authority (MTA) Subway Data Analysis

Go library to create and manage data pipelines on your machine

Buildersoft Andy X Project

Richflow ⭐ 13

A Node.js and JavaScript synchronous data pipeline processing, data sharing and stream processing library. Actionable & Transformable Pipeline data processing.

Fsharp Data Processing Pipeline ⭐ 13

Provides an extensible solution for creating Data Processing Pipelines in F#.

Medicare_cclf_connector ⭐ 12

This connector is a dbt project that maps Medicare CCLF claims data to the Tuva Input Layer.

Marshmallow Pyspark ⭐ 12

Marshmallow serializer integration with pyspark

Data Toolkit ⭐ 11

Data Pipeline Toolkit for Early-Stage Startups

Data Paths ⭐ 11

RPJiOS: RPJ's RPi OS, a sensor data platform for the Raspberry Pi built with python2.7 and redis.

Fake Data Pipeline ⭐ 10

Data Generators -> Kafka -> Spark Streaming -> PostgreSQL -> Grafana

A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in Airflow.

Gfw_forest_loss_geotrellis ⭐ 10

Global Tree Cover Loss Analysis using Geotrellis and SPARK

Dataquest_eng ⭐ 10

Here's how to get DataQuest's Data Engineering Track missions' content to work on your localhost. Using data from my Valenbisi ARIMA modeling project, I document my steps using PostgreSQL, Postico, and the Command Line to get our DataQuest exercises running out of a Jupyter Notebook.

Airflow Kubernetes ⭐ 9

Simple Airflow on Kubernetes (GKE)

Tuva_demo ⭐ 9

A starter dbt project and synthetic claims dataset for trying out the Tuva Project.

Time series data utilities for declaratively applying standardization, Q/C, and transformations to datastreams.

Define sophisticated data pipelines with Python and run them on different distributed systems (such as Argo Workflows).

Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag

Stream Accumulator ⭐ 9

Accumulate all the data flowing through a stream and emit it as a single chunk or as a promise

Twitter Pipeline ⭐ 9

In this project, you will be building a Twitter Scheduler using Apache Airflow on Docker.

Dbt Documentor ⭐ 9

✍️ dbt doc generator for advanced data teams

Snowflake Json Datapipeline ⭐ 9

Building Json data pipeline within Snowflake using Streams and Tasks

Scribe Data ⭐ 9

Wikidata and Wikipedia data extraction for Scribe applications

Serverless Datapipeline Aws Sam ⭐ 8

Covalent Ssh Plugin ⭐ 8

Executor plugin interfacing Covalent with remote backends using SSH

Pandemic Knowledge ⭐ 8

A fully-featured multi-source data pipeline for continuously extracting knowledge from COVID-19 data.

Teleporter ⭐ 8

Reactive Streams distributed datapipeline for data process. Now support kafka,jdbc,kudu,elasticsearch,hdfs.etc

High speed message passing between various queues and services

Category-wide association study (CWAS) (Werling et al., 2018; An et al., 2018)

Terraform Provider Montecarlo ⭐ 7

This open-source Terraform provider enables users to seamlessly integrate the Monte Carlo data reliabillity platform into their infrastructure as a code (IaC) workflows.

Code First Pipelines ⭐ 7

A code-first way to define Ploomber pipelines

Framework for building data pipelines

Data Engineer Challenge ⭐ 7

Challenge Data Engineer

Final Project End To End Banking Campaign Pipeline ⭐ 7

Final Project for IYKRA Data Fellowship 8 Program, creating an end-to-end banking campaign pipeline using lambda architecture (providing acess to batch and stream processing)

Medicare_lds_connector ⭐ 7

Maps Medicare LDS claims data to the Tuva Input Layer so you can easily run the Tuva Project.

Didact Ui ⭐ 7

The VueJS, Flowbite-powered single-page app dashboard for the Didact Platform.

Business model representation automation

Community ⭐ 6

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Best practices for data workflows, integrations with the Modern Data Stack (MDS), Infrastructure as Code (IaC), Cloud Provider Services

A module for extracting, transforming, and loading battery cycler data to a database.

Fhir_connector ⭐ 6

Connector that loads FHIR r4 USCDIv3 JSON data from local file storage into the Tuva common data model in Snowflake.

Go Tfdata ⭐ 6

Go library that provides easy-to-use interfaces and tools for TensorFlow users, in particular allowing to train existing TF models on .tar and .tgz datasets

Airflow4ds ⭐ 6

Using Apache Airflow to author, run and monitor complex data pipelines.

Gtfs Data Pipeline Tfnsw Bus ⭐ 6

GTFS Data Pipeline for TfNSW Bus Datasets

Modeling tool like DBT to use SQL Alchemy core with a DataFrame interface like

Opensnowcat Collector ⭐ 6

OpenSnowcat Collector, an open source fork of Snowplow (Apache 2.0 License)

Blockchain sync for the cult of Cardano

Awesome Data Pipeline ⭐ 6

Awesome list for datapipeline

Datacrafter ⭐ 6

NoSQL extract, transform, load (ETL) toolkit with Python

Cribl Knowledge Pack ⭐ 6

Examples of best-in-class use cases curated from community members and Cribl Solutions Engineers.

This is the STRM Privacy Command Line Interface, to define and manage your privacy streams, data schemas, event contracts and much more.

Cribl Syslog Input ⭐ 6

This Pack enables a variety of functions when LogStream is used to receive data from Syslog senders.

A real-time news scraping and recommendation system

Job openings at Quod AI

CodePack - A Python package to easily make, run, and manage workflows

Final Project Level3 Cv 01 ⭐ 5

HEY-I (HElp Your Interview)

Oh My Github Pipeline ⭐ 5

🔄 A flexible open-source data pipeline for seamlessly syncing data from any github user to your database.

101-200 of 212 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.