Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Airbyte | 10,071 | 10 hours ago | 90 | June 23, 2022 | 4,443 | other | Python | |||
Data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. | ||||||||||
Dagster | 6,947 | 2 | 89 | 12 hours ago | 495 | July 06, 2022 | 1,588 | apache-2.0 | Python | |
An orchestration platform for the development, production, and observation of data assets. | ||||||||||
Benthos | 5,865 | 4 | a day ago | 518 | August 10, 2022 | 338 | mit | Go | ||
Fancy stream processing made operationally mundane | ||||||||||
Cloudquery | 4,244 | 6 | 6 hours ago | 241 | August 14, 2022 | 181 | mpl-2.0 | Go | ||
The open source high performance data integration platform built for developers. | ||||||||||
Mage Ai | 3,691 | 20 hours ago | 9 | June 27, 2022 | 54 | apache-2.0 | Python | |||
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data. | ||||||||||
Aws Sdk Pandas | 3,374 | 34 | a day ago | 125 | June 28, 2022 | 53 | apache-2.0 | Python | ||
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL). | ||||||||||
Kestra | 3,244 | a day ago | 28 | August 30, 2022 | 142 | apache-2.0 | Java | |||
Kestra is an infinitely scalable orchestration and scheduling platform, creating, running, scheduling, and monitoring millions of complex pipelines. | ||||||||||
Incubator Devlake | 1,976 | 13 hours ago | 79 | August 26, 2022 | 122 | apache-2.0 | Go | |||
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth. | ||||||||||
Pyspark Example Project | 1,034 | 4 months ago | 11 | Python | ||||||
Example project implementing best practices for PySpark ETL jobs and applications. | ||||||||||
Hamilton | 894 | a month ago | 21 | July 03, 2022 | 12 | bsd-3-clause-clear | Python | |||
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton |
A metadata toolkit written in Python
Recap reads and converts schemas in dozens of formats including Parquet, Protocol Buffers, Avro, and JSON schema, BigQuery, Snowflake, and PostgreSQL.
CREATE TABLE
DDL from schemas for popular database SQL dialects.pip install recap-core
Read schemas from objects:
s = from_proto(message)
Or files:
s = schema("s3://corp-logs/2022-03-01/0.json")
Or databases:
s = schema("snowflake://ycbjbzl-ib10693/TEST_DB/PUBLIC/311_service_requests")
And convert them to other formats:
to_json_schema(s)
{
"type": "object",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "string"
}
},
"required": [
"id"
]
}
Or even CREATE TABLE
statements:
s = schema("/tmp/data/file.json")
to_ddl(s, "my_table", dialect="snowflake")
CREATE TABLE "my_table" (
"col1" BIGINT,
"col2" STRUCT<"col3" VARCHAR>
)
See the Quickstart page to get started.