Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for etl
etl
x
964 search results found
Bellboy
⭐
90
Highly performant JavaScript data stream ETL engine.
Awesome Rails
⭐
90
A curated list of amazingly awesome open source rails related resources inspired by Awesome PHP.
Airbyte Connectors
⭐
90
Airbyte connectors (sources & destinations) + Airbyte CDK for JavaScript/TypeScript
Adflab
⭐
89
Azure Data Factory hands-on lab, self-paced. Learn how to lift & shift SSIS packages to the Cloud with ADF. Build new ETL pipelines in ADF, transform data at scale, load Azure Data Warehouse data marts. Also walks through operationalizing ADF pipelines with scheduling and monitoring modules.
Open Data Etl Utility Kit
⭐
87
Use Pentaho's open source data integration tool (Kettle) to create Extract-Transform-Load (ETL) processes to update a Socrata open data portal. Documentation is available at http://open-data-etl-utility-kit.readthedocs.io/en
Mars
⭐
87
The powerful analysis platform to explore and visualize data from blockchain.
Flowman
⭐
85
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
Php Etl
⭐
85
Extract-Transform-Load library
Kettleinaction100
⭐
85
Kettle实战100篇博文
Sling Cli
⭐
84
Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
Etl Synthea
⭐
83
Conversion from Synthea CSV to OMOP CDM
Airbyte_serverless
⭐
83
Airbyte made simple (no UI, no database, no cluster)
Go Etl
⭐
83
go-etl是一个集数据源抽取,转化,加载的工具集,提供强大的离线数据同步能力。
Sycamore
⭐
82
🍁 Sycamore is an LLM-powered semantic data preparation system for building search applications.
Logstash Test Runner
⭐
82
Logstash configuration testing framework
Stetl
⭐
81
Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Etlhelper
⭐
81
ETL Helper is a Python ETL library to simplify data transfer into and out of databases.
Thain
⭐
81
Thain is a distributed flow schedule platform.
Datacater
⭐
80
The developer-friendly ETL platform for transforming data in real-time. Based on Apache Kafka® and Kubernetes®.
Dataengineeringpilipinas
⭐
80
Data Engineering Pilipinas is a community for data engineers, data analysts, data scientists, developers, AI / ML engineers, and users of closed and open source data tools and methods / techniques in the Philippines. Data Engineering Pilipinas is a PyData group.
Gallia Core
⭐
79
A schema-aware Scala library for data transformation
Rocket Bi
⭐
79
A free, open-source, web-based self-service BI tailor-made for clickhouse, google bigquery, mysql, postgresql, vertica
Data Engineering Nanodegree
⭐
76
Projects done in the Data Engineering Nanodegree by Udacity.com
Projects
⭐
76
Sample projects using Ploomber.
Scriptella Etl
⭐
75
Scriptella is an open source ETL (Extract-Transform-Load) and script execution tool written in Java
Reactiveetl
⭐
74
Reactive ETL is a rewrite of Rhino ETL using the reactive extensions for .Net.
Luigi Warehouse
⭐
73
A luigi powered analytics / warehouse stack
Etl Cms
⭐
72
Workproducts to ETL CMS datasets into OMOP Common Data Model
Dflib
⭐
71
In-memory Java DataFrame library
Etw2json
⭐
71
Tool and library to convert ETW logs to JSON files
Discreetly
⭐
70
ETLy is an add-on dashboard service on top of Apache Airflow.
Openrefine Batch
⭐
70
Shell script to run OpenRefine in batch mode (import, transform, export). It orchestrates OpenRefine (server) and a python client that communicates with the OpenRefine API.
Prism
⭐
70
Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python.
Dataexpress
⭐
69
[NOT MAINTAINED] DataExpress is a simple, Scala-based cross database ETL toolkit supporting Postgres, MySql, Oracle, SQLServer, and Sqlite
Mongosyphon
⭐
68
A tool for transferring data from a Relational Database to MongoDB
Sqlbucket
⭐
67
Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.
Csvplus
⭐
67
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Data Wrangling With Python
⭐
66
Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices
Spark
⭐
65
Open Source D-APM (Data-Application Performance Monitoring) for Apache Spark
Beneath
⭐
64
Beneath is a serverless real-time data platform ⚡️
Udacity Data Engineer Nanodegree
⭐
64
Classwork projects and home works done through Udacity data engineering nano degree
Django Calaccess Raw Data
⭐
63
A Django app to download, extract and load campaign finance and lobbying activity data from the California Secretary of State's CAL-ACCESS database
Blockchain Etl
⭐
63
Blockchain follower that follows and stores the Helium blockchain
Bigquery Etl Dataflow Sample
⭐
62
Dataduck
⭐
62
DataDuck ETL - the extract-transform-load framework for data warehousing
Spark Etl
⭐
62
Apache Spark based ETL Engine
Zeus
⭐
61
Zeus + SciFi = the power of the gods, meets the power of science fiction. Designing wisdom into intelligence, through intelligent design.
Mbrainz Importer
⭐
60
Clojure/Transducers/Datomic ETL example
Yaetl
⭐
60
Yet Another ETL in PHP
Bentools Etl
⭐
60
PHP ETL (Extract / Transform / Load) library with SOLID principles + almost no dependency.
Dbt Sqlite
⭐
59
A SQLite adapter plugin for dbt (data build tool)
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Steampipe Plugin Github
⭐
58
Use SQL to instantly query repositories, users, gists and more from GitHub. Open source CLI. No DB required.
Etl Parser
⭐
57
Event Trace Log file parser in pure Python
Openrefine Client
⭐
57
The OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, Mac). It is also available via Docker Hub, PyPI and Binder.
Etl_with_python
⭐
57
ETL with Python - Taught at DWH course 2017 (TAU)
Zdh_server
⭐
56
数据采集平台zdh,etl 处理服务
Etl
⭐
56
Embedded Template Library
Rony
⭐
56
Data Engineering made simple - An opinionated Data Engineering framework
Krawler
⭐
55
A minimalist (geospatial) ETL
Data Engineering
⭐
55
How to build an awesome data engineering team
Skaetl
⭐
55
Open Source ETL designed for and dedicated to Log processing and transformation
Onetl
⭐
55
One ETL tool to rule them all
Drivers
⭐
53
Low-code Python library enabling access to APIs, tools, data sources in seconds.
Sqlpipe
⭐
52
SQLpipe makes it easy to move the result of one query from one database to another.
Pyetl
⭐
51
python ETL framework
Getl
⭐
51
A tool for developing and testing ETL and ELT processes for automating the capture, delivery and processing of information in data warehouses on the MicroFocus Vertica platform.
Dswarm
⭐
51
an open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wi
Mydataharbor
⭐
50
🇨🇳 MyDataHarbor是一个致力于解决任意数据源到任意数据源的分布式、高扩展性、高性能、事务级的数
Checkbook
⭐
49
Source codes, data, and instructions for Checkbook
Uptasticsearch
⭐
48
An Elasticsearch client tailored to data science workflows.
Unihan Etl
⭐
48
Export UNIHAN's database to csv, json or yaml
Etl Engine
⭐
47
etl engine 轻量级 跨平台 流批一体ETL引擎 数据抽取-转换-装载 ETL engine lightweight cross platform batch flow integration ETL engine data extraction transformation loading
Amaxa
⭐
47
A multi-object ETL tool for Salesforce.
Datapipelines Essentials Python
⭐
45
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Smartetl
⭐
45
A light weight ETL engine and smart transformation framework
Legis Graph
⭐
45
ETL scripts for loading US Congressional data from govtrack.us into Neo4j
Etl
⭐
45
ETL Workflow
Virtual Data Warehouse
⭐
44
The Virtual Data Warehouse is a code generation and template management tool. It is part of the data solution automation ecosystem - the 'engine' for data solution automation.
Configs
⭐
43
Public, free to use, repository with diggers configs for scraping / extracting data from various e-commerce websites and online stores
Etlflow
⭐
43
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Google Cloud Platform, AWS, Kubernetes, Databases, SFTP servers, On-Prem Systems and more.
Odoo Etl
⭐
43
Odoo data manipulation, like an small ELT (Extract, Load, Transform) for odoo databases.
Bigmetadata
⭐
43
Udacity Data Engineering
⭐
42
Udacity Data Engineering Nano Degree (DEND)
Architect_big_data_solutions_with_spark
⭐
42
code, labs and lectures for the course
Ruby For Pentaho Kettle
⭐
42
Ruby scripting for pentaho-kettle
Functions
⭐
42
Serverless ETL using cloud functions https://fivetran.com/docs/functions
Data Solution Framework
⭐
42
A library for data warehouse and data integration pattern and architecture documentation.
Steampipe Plugin Kubernetes
⭐
41
Use SQL to instantly query Kubernetes API resources. Open source CLI. No DB required.
Terminatooor
⭐
40
Brute force your OpenERP data integration with OOOR inside the Kettle ETL (aka Pentaho Data Integration - PDI)
Steampipe Sqlite
⭐
39
Steampipe SQLite is a zero-ETL engine for SQLite. Virtual tables translate queries into live API calls for cloud services and APIs. Hundreds of plugins with thousands of documented examples.
Koza
⭐
38
Data transformation framework for LinkML data models
Fhir River
⭐
38
Live ETL pipeline to standardize Health Data into FHIR.
Sluice
⭐
38
A Ruby toolkit for cloud-friendly ETL
Redis Connect Dist
⭐
38
Real-Time Event Streaming & Change Data Capture
Datasphere Integration
⭐
38
an data-centric integration platform
Csv Cruncher
⭐
38
Treats CSV and JSON files as SQL tables, and exports SQL SELECTs back to CSV or JSON.
Projeto_etl_rfb_ibge_anp
⭐
38
PYTHON E POSTGRESQL - EXTRACT TRANSFORM LOAD - ETL - DADOS PÚBLICOS DA RECEITA FEDERAL DO BRASIL - RFB, INSTITUTO BRASILEIRO DE GEOGRAFIA E ESTATÍSTICA - IBGE E AGÊNCIA NACIONAL DO PETRÓLEO, GÁS NATURAL E BIOCOMBUSTÍVEIS - ANP - PYTHON E POSTGRESQL
Vixtract
⭐
38
Etl Light
⭐
38
A light Kafka to HDFS/S3 ETL library based on Apache Spark
201-300 of 964 search results
< Previous
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.