Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Prql | 7,175 | a day ago | 147 | apache-2.0 | Rust | |||||
PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement | ||||||||||
Beam | 6,907 | 12 | 17 hours ago | 532 | August 17, 2022 | 4,264 | apache-2.0 | Java | ||
Apache Beam is a unified programming model for Batch and Streaming data processing. | ||||||||||
Mage Ai | 4,790 | a day ago | 9 | June 27, 2022 | 83 | apache-2.0 | Python | |||
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data. | ||||||||||
Sqlmesh | 504 | a day ago | 43 | apache-2.0 | Python | |||||
SQLMesh is a DataOps framework that brings the benefits of DevOps to data teams. It enables data scientists, analysts, and engineers to efficiently run and deploy data transformations written in SQL or Python. | ||||||||||
Versatile Data Kit | 338 | 18 hours ago | 121 | June 24, 2022 | 200 | apache-2.0 | Python | |||
Build, run and manage your data pipelines with Python or SQL on any cloud | ||||||||||
Yuniql | 292 | 1 | 7 | a year ago | 25 | May 25, 2022 | 65 | apache-2.0 | C# | |
Free and open source schema versioning and database migration made natively with .NET/6. NEW THIS MAY 2022! v1.3.15 released! | ||||||||||
Cuelake | 266 | a year ago | 11 | apache-2.0 | JavaScript | |||||
Use SQL to build ELT pipelines on a data lakehouse. | ||||||||||
Bulk Writer | 218 | a year ago | 4 | mit | C# | |||||
Provides guidance for fast ETL jobs, an IDataReader implementation for SqlBulkCopy (or the MySql or Oracle equivalents) that wraps an IEnumerable, and libraries for mapping entites to table columns. | ||||||||||
Analytics Readings | 155 | 6 months ago | 1 | |||||||
Readings for Analytics Engineers | ||||||||||
Pythoncrawler Scrapy Mysql File Template | 153 | 6 years ago | 2 | mit | Python | |||||
scrapy爬虫框架模板,将数据保存到Mysql数据库或者文件中。 |
Versatile Data Kit (VDK) is a data framework that enables Data Engineers to
Its Lego-likedesign consistsof lightweight Python modules installed viapip
package manager. All VDK plugins are easy to combine.
VDK CLI can generate a data job and run your Python code and SQL queries.
VDK SDK makes your code shorter, more readable, and faster to create.
Ready-to-use data ETL/ELT patterns make Data Engineering with VDK efficient.
Data Engineers use VDK to implement automatic pull ingestion (E in ELT) and batch data transformation (T in ELT) into a database or any other data storage.
VDK creates data processing workflows to:
pip install quickstart-vdk
This installs the core vdk packages and the vdk command line interface. You can use them to run jobs in your local shell environment.
See also the Getting Started section of the wiki
Using Kubernetes for your data jobs workflow provides additional benefits, such as continuous delivery, easier collaboration, streamlined data job orchestration, high availability, security, and job runtime isolation
More info https://kubernetes.io/docs/concepts/overview/
Prerequisites
vdk server --install
You can then use the vdk cli to create and deploy jobs and the UI to manage them.
Getting started with VDK Operations UI Use case examples that show how VDK fits into the data workflow. VDK with Trino DB. Get to know us and ask questions at our community meeting
Running in production Documentation for VDK. VDK Operations UI Overview
Create an issue or pull request on GitHub to submit suggestions or changes. If you are interested in contributing as a developer, visit the contributing page.
Everyone involved in working on the project's source code, or engaging in any issue trackers, Slack channels, and mailing lists is expected to be familiar with and follow the Code of Conduct.