Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for etl pipeline
etl-pipeline
x
91 search results found
Orchest
⭐
3,876
Build data pipelines, the easy way 🛠️
Incubator Streampark
⭐
3,604
Make stream processing easier! Easy-to-use streaming application development framework and operation platform.
Hamilton
⭐
1,538
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
Udacity Data Engineering Projects
⭐
1,335
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Pyspark Example Project
⭐
1,034
Example project implementing best practices for PySpark ETL jobs and applications.
Hamilton
⭐
877
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
Tech.ml.dataset
⭐
616
A Clojure high performance data processing system
Goodreads_etl_pipeline
⭐
593
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Metorikku
⭐
536
A simplified, lightweight ETL Framework based on Apache Spark
Flow
⭐
290
Flow PHP - strongly typed data processing framework
Etlbox
⭐
226
A lightweight ETL (extract, transform, load) library and data integration toolbox for .NET.
Setl
⭐
177
A simple Spark-powered ETL framework that just works 🍺
Watchmen Matryoshka Doll
⭐
124
Watchmen Platform is a low code data platform for data pipeline, meta data management , analysis, and quality management
Patterns Devkit
⭐
101
Data pipelines from re-usable components
Violet_rails
⭐
95
an app engine for your business. Seamlessly implement business logic with a powerful API. Out of the box CMS, blog, forum and email functionality. Developer friendly & easily extendable for your next SaaS/XaaS project. Built with Rails 6, Devise, Sidekiq & PostgreSQL
Bulker
⭐
92
Service for bulk-loading data to databases with automatic schema management (Redshift, Snowflake, BigQuery, ClickHouse, Postgres, MySQL)
Prism
⭐
70
Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python.
Jayvee
⭐
68
Jayvee is a domain-specific language and runtime for automated processing of data pipelines
Csvplus
⭐
67
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Dig Etl Engine
⭐
65
Download DIG to run on your laptop or server.
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Seo Dashboard
⭐
57
SEO dashboard from Search console Data using the Google Search API, Mysql database , NodeJS RESTAPI( ExpressJS) and reactJs Dashboard
Onetl
⭐
55
One ETL tool to rule them all
Dataligo
⭐
47
A library to accelerate ML and ETL pipeline by connecting all data sources
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Etlflow
⭐
43
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Google Cloud Platform, AWS, Kubernetes, Databases, SFTP servers, On-Prem Systems and more.
Data Science Regular Bootcamp
⭐
39
Regular practice on Data Science, Machien Learning, Deep Learning, Solving ML Project problem, Analytical Issue. Regular boost up my knowledge. The goal is to help learner with learning resource on Data Science filed.
Vixtract
⭐
38
Conductor Python
⭐
38
Conductor OSS SDK for Python programming language
Redis Connect Dist
⭐
38
Real-Time Event Streaming & Change Data Capture
Uber Expenses Tracking
⭐
35
The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.
Stellar Etl Airflow
⭐
31
Airflow DAGs for the Stellar ETL project
Bitcoinmonitor
⭐
31
Near real time ETL to populate a dashboard.
Azuredatafactoryhol
⭐
28
Azure Data Factory Hands On Lab - Step by Step - A Comprehensive Azure Data Factory and Mapping Data Flow step by step tutorial
Stellar Etl
⭐
27
Stellar ETL will enable real-time analytics on the Stellar network
Ethereum_analytical_db
⭐
25
Ethereum Analytical Database - Ethereum data access solution that can be used for analytics and application development. The solution works on a fast DB - Clickhouse.
Tweetsolaping
⭐
24
implementing an end-to-end tweets ETL/Analysis pipeline.
Daflow
⭐
24
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Seatunnel Example
⭐
23
seatunnel plugin developing examples.
Udacity Data Eng Proj2
⭐
23
A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract data from S3, apply a series of transformations and load into S3 and Redshift.
Jira Database Etl
⭐
20
🚹 💾 Script to import issues from a JIRA instance into a database.
Aws Youtube Analytics
⭐
20
It aims to securely manage, streamline, and perform analysis on the structured and semi-structured YouTube videos data based on the video categories and the trending metrics.
Data Refinery
⭐
19
Data transformation
Cobol On K8s
⭐
18
Running an ETL pipeline with COBOL on Kubernetes
Airflowetl
⭐
16
Blog post on ETL pipelines with Airflow
Greenish
⭐
15
Data monitoring tool, monitors the result, not the run
Automated_etl_google_cloud Social_dashboard
⭐
15
A dashboard is worth a thousand words => https://datastudio.google.com/reporting/755f3183-d
Noleme Flow
⭐
14
A library enabling DAG structuring of data processing programs such as ETLs
Kafka Connect Datagen
⭐
14
A Kafka Connect source connector that generates data for tests
Nyc_taxi_pipeline
⭐
12
Design/Implement stream/batch architecture on NYC taxi data | #DE
Singer Working Group
⭐
12
Working group for ongoing development and iteration of the Singer Spec, the de-facto protocol for open source data connectors. Please use "Issues" to create discussion items - or use "Discussions" for general questions.
Spotify Etl
⭐
12
Spotify ETL Pipeline
Azure Data Factory
⭐
11
Aprender Gerencimento de Dados ETL/ELT
Airflowjob
⭐
11
Airflow POC demo : 1) env set up 2) airflow DAG 3) Spark/ML pipeline | #DE
Serverless Python Workflow With Aws Lambda
⭐
11
A tutorial to setup and deploy a simple Serverless Python workflow with REST API endpoints in AWS Lambda.
Dappboard Etl
⭐
11
ETL pipeline for the Ethereum blockchain
Gfw Data Api
⭐
10
GFW Data API
Pstl
⭐
10
Parallel Streaming Transformation Loader
Yasp
⭐
9
Yet Another SPark Framework
Exampadata
⭐
8
A container for data sets to help actuaries who are practicing predictive analytics
Pyspark Template
⭐
8
A Python PySpark Projet with Poetry
Spotify_etl
⭐
8
Using an ETL pipeline to investigate the change in hip-hop/rap genre over time
Data Graph
⭐
8
Flow and event based data processing
Spooq
⭐
8
Dlt With Debug
⭐
8
A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.
Etl Pipeline Using Airflow
⭐
7
ETL pipeline to extract data from AWS S3 and transform it and load it to AWS Redshift
Etlast
⭐
7
ETL (Extract, Transform and load) library for .Net
Disaster Response Pipeline
⭐
7
ETL pipeline combined with supervised learning and grid search to classify text messages sent during a disaster event
Etl Pipeline Runner
⭐
7
A package to run ETL Pipelines for your datascience projects
Source Watcher Core
⭐
7
This is a PHP project which combines ETL with different strategies to extract data from multiple databases, files, and services, transform it and load it into multiple destinations.
Wiredflow
⭐
7
Lightweight library for creating services using just Python
Documentation
⭐
6
Documentation for the TriplyDB and TriplyETL products
Thomasnet Scraper
⭐
6
Scraping USA Hardware Suppliers Data
Valves
⭐
6
general functions for your data .pipe()-lines.
Amazon Sagemaker Predict Accessibility
⭐
6
Build end-to-end Machine Learning pipeline to predict accessibility of playgrounds in NYC
Chromosomedna
⭐
6
《DNA元基催化与肽计算》 在进化计算中, 软件函数文件进行 DNA 语义元基索引编码的 PDE 新陈代谢优化方式, 是一种有效的进化方式.
Pyemits
⭐
5
Sugar candy for data scientist. Easy manipulation in time-series data analytics works.
Dados Censup
⭐
5
Automação da ingestão de dados disponibilizados pelo INEP referente ao censo superior da educacão brasileira.
Pipes
⭐
5
Complex data processing flows in Go
Stock_streaming_pipeline_project
⭐
5
Built a real-time streaming pipeline to extract stock data, using Apache Nifi, Debezium, Kafka, and Spark Streaming. Loaded the transformed data into Glue database and created real-time dashboards using Power BI and Tableau with Athena. The pipeline is orchestrated using Airflow.
Ted Rdf Conversion Pipeline
⭐
5
TED Semantic Web Services
Celo Etl
⭐
5
Python scripts for ETL (extract, transform and load) jobs for Celo blockchain blocks, transactions and more coming.
Rabbit In A Blender
⭐
5
An ETL pipeline to transform your EMP data to OMOP
Trusted Data Pipeline
⭐
5
Building 3D Trusted Data Pipelines With Dagster, Dbt, and Duckdb
Lightflow
⭐
5
A flexible, light, easy to use, automation framework for typical data manipulation with terminal commands.
Gcp Airflow Foundations
⭐
5
Opinionated framework based on Airflow 2.0 for building pipelines to ingest data into a BigQuery data warehouse
Cryptodatapy
⭐
5
CryptoDataPy is a python library that makes it easy to build high quality data pipelines for the analysis of cryptoassets
Build Etl Using Ssis
⭐
5
Starter project for building an ETL pipeline using SSIS in Visual Studio 2019
Stock Market Real Time Data Pipeline With Apache Kafka And Cassandra
⭐
5
A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apache Kafka and stored in a local Cassandra database.
Aws Data Pipelines For Azure Storage
⭐
5
Copy data from Azure Blob Storage to Amazon S3 using code. View Azure costs using Amazon QuickSight
Otokuna
⭐
5
A system and web app to discover good deals of rental properties, built and automated on a serverless architecture.
1-91 of 91 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.