Versatile Data Kit

Build, run and manage your data pipelines with Python or SQL on any cloud
Alternatives To Versatile Data Kit
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Prql7,175
a day ago147apache-2.0Rust
PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
Beam6,9071217 hours ago532August 17, 20224,264apache-2.0Java
Apache Beam is a unified programming model for Batch and Streaming data processing.
Mage Ai4,790
a day ago9June 27, 202283apache-2.0Python
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Sqlmesh504
a day ago43apache-2.0Python
SQLMesh is a DataOps framework that brings the benefits of DevOps to data teams. It enables data scientists, analysts, and engineers to efficiently run and deploy data transformations written in SQL or Python.
Versatile Data Kit338
18 hours ago121June 24, 2022200apache-2.0Python
Build, run and manage your data pipelines with Python or SQL on any cloud
Yuniql29217a year ago25May 25, 202265apache-2.0C#
Free and open source schema versioning and database migration made natively with .NET/6. NEW THIS MAY 2022! v1.3.15 released!
Cuelake266
a year ago11apache-2.0JavaScript
Use SQL to build ELT pipelines on a data lakehouse.
Bulk Writer218
a year ago4mitC#
Provides guidance for fast ETL jobs, an IDataReader implementation for SqlBulkCopy (or the MySql or Oracle equivalents) that wraps an IEnumerable, and libraries for mapping entites to table columns.
Analytics Readings155
6 months ago1
Readings for Analytics Engineers
Pythoncrawler Scrapy Mysql File Template153
6 years ago2mitPython
scrapy爬虫框架模板,将数据保存到Mysql数据库或者文件中。
Alternatives To Versatile Data Kit
Select To Compare


Alternative Project Comparisons
Readme

Versatile Data Kit Versatile Data Kit

Last Activity license pre-commit build status twitter YouTube Channel Subscribers

Overview

Versatile Data Kit (VDK) is a data framework that enables Data Engineers to

  • develop,
  • run,
  • and manage data workloads, aka data jobs

Its Lego-likedesign consistsof lightweight Python modules installed viapip package manager. All VDK plugins are easy to combine.

VDK CLI can generate a data job and run your Python code and SQL queries.

VDK SDK makes your code shorter, more readable, and faster to create.
Ready-to-use data ETL/ELT patterns make Data Engineering with VDK efficient.

Data Engineers use VDK to implement automatic pull ingestion (E in ELT) and batch data transformation (T in ELT) into a database or any other data storage.

Data Journey and Versatile Data Kit

VDK creates data processing workflows to:

  • Ingest data (extract)
  • Transform data (transform)
  • Export data (load)

Data Journey Data Journey

Solve common data engineering problems

  • Ingest data from different sources, including CSV files, JSON objects, and data from REST API services.
  • Use Python/SQL and VDK templates to transform data.
  • Ensure data applications are packaged, versioned, and deployed correctly while dealing with credentials, retries, and reconnects.
  • Provide built-in monitoring and smart notification capabilities.
  • Track both code and data modifications and the relationship between them, allowing quicker troubleshooting and version rollback.

Without / With Versatile Data Kit Without / With Versatile Data Kit Without / With Versatile Data Kit code Without / With Versatile Data Kit code

What VDK can do

Getting Started

Create and run data jobs locally

pip install quickstart-vdk

This installs the core vdk packages and the vdk command line interface. You can use them to run jobs in your local shell environment.

See also the Getting Started section of the wiki

Run the Control Service locally with Docker and Kubernetes

Using Kubernetes for your data jobs workflow provides additional benefits, such as continuous delivery, easier collaboration, streamlined data job orchestration, high availability, security, and job runtime isolation

More info https://kubernetes.io/docs/concepts/overview/

Prerequisites

vdk server --install

You can then use the vdk cli to create and deploy jobs and the UI to manage them.

Next Steps

Getting started with VDK Operations UI Use case examples that show how VDK fits into the data workflow. VDK with Trino DB. Get to know us and ask questions at our community meeting

Additional Resources

Running in production Documentation for VDK. VDK Operations UI Overview

Contributing

Create an issue or pull request on GitHub to submit suggestions or changes. If you are interested in contributing as a developer, visit the contributing page.

Contacts

Code of Conduct

Everyone involved in working on the project's source code, or engaging in any issue trackers, Slack channels, and mailing lists is expected to be familiar with and follow the Code of Conduct.

Popular Sql Projects
Popular Pipeline Projects
Popular Data Processing Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
Sql
Pipeline
Data Science
Etl
Snowflake
Data Engineering
Data Warehouse