Recap

Recap is a metadata toolkit written in Python
Alternatives To Recap
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Airbyte10,071
10 hours ago90June 23, 20224,443otherPython
Data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes.
Dagster6,94728912 hours ago495July 06, 20221,588apache-2.0Python
An orchestration platform for the development, production, and observation of data assets.
Benthos5,8654a day ago518August 10, 2022338mitGo
Fancy stream processing made operationally mundane
Cloudquery4,24466 hours ago241August 14, 2022181mpl-2.0Go
The open source high performance data integration platform built for developers.
Mage Ai3,691
20 hours ago9June 27, 202254apache-2.0Python
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Aws Sdk Pandas3,37434a day ago125June 28, 202253apache-2.0Python
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Kestra3,244
a day ago28August 30, 2022142apache-2.0Java
Kestra is an infinitely scalable orchestration and scheduling platform, creating, running, scheduling, and monitoring millions of complex pipelines.
Incubator Devlake1,976
13 hours ago79August 26, 2022122apache-2.0Go
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
Pyspark Example Project1,034
4 months ago11Python
Example project implementing best practices for PySpark ETL jobs and applications.
Hamilton894
a month ago21July 03, 202212bsd-3-clause-clearPython
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
Alternatives To Recap
Select To Compare


Alternative Project Comparisons
Readme

recap

A metadata toolkit written in Python

Actions Status Imports: isort Code style: black pylint

About

Recap reads and converts schemas in dozens of formats including Parquet, Protocol Buffers, Avro, and JSON schema, BigQuery, Snowflake, and PostgreSQL.

Features

  • Read schemas from filesystems, object stores, and databases.
  • Convert schemas between Parquet, Protocol Buffers, Avro, and JSON schema.
  • Generate CREATE TABLE DDL from schemas for popular database SQL dialects.
  • Infer schemas from unstructured data like CSV, TSV, and JSON.

Compatibility

Installation

pip install recap-core

Examples

Read schemas from objects:

s = from_proto(message)

Or files:

s = schema("s3://corp-logs/2022-03-01/0.json")

Or databases:

s = schema("snowflake://ycbjbzl-ib10693/TEST_DB/PUBLIC/311_service_requests")

And convert them to other formats:

to_json_schema(s)
{
  "type": "object",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "properties": {
    "id": {
      "type": "integer"
    },
    "name": {
      "type": "string"
    }
  },
  "required": [
    "id"
  ]
}

Or even CREATE TABLE statements:

s = schema("/tmp/data/file.json")
to_ddl(s, "my_table", dialect="snowflake")
CREATE TABLE "my_table" (
  "col1" BIGINT,
  "col2" STRUCT<"col3" VARCHAR>
)

Getting Started

See the Quickstart page to get started.

Popular Etl Projects
Popular Data Engineering Projects
Popular Data Processing Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
Etl
Data Engineering