Awesome Open Source
Awesome Open Source

Serverless Data Lake Framework (SDLF)

An AWS Professional Service open source initiative | [email protected]

The Serverless Data Lake Framework (SDLF) is a collection of reusable artifacts aimed at accelerating the delivery of enterprise data lakes on AWS, shortening the deployment time to production from several months to a few weeks. It can be used by AWS teams, partners and customers to implement the foundational structure of a data lake following best practices.

Motivation

A data lake gives your organization agility. It provides a repository where consumers can quickly find the data they need and use it in their business projects. However, building a data lake can be complex; there’s a lot to think about beyond the storage of files. For example, how do you catalog the data so you know what you’ve stored? What ingestion pipelines do you need? How do you manage data quality? How do you keep the code for your transformations under source control? How do you manage development, test and production environments? Building a solution that addresses these use cases can take many weeks and this time can be better spent innovating with data and achieving business goals. The SDLF is a collection of production-hardened, best practice templates which accelerate your data lake implementation journey on AWS, so that you can focus on use cases that generate value for business.

Public References

AWS Serverless Data Lake Framework

Workshop

To quickly get started with SDLF, follow our workshop:

https://sdlf.workshop.aws/

Read The Docs

Ingestion/Processing Library


Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
python (54,525
framework (1,117
aws (1,102
serverless (660
analytics (326
best-practices (183
etl (106
data-engineering (52
iac (20