Data Engineering Howto

A list of useful resources to learn Data Engineering from scratch
Alternatives To Data Engineering Howto
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Data Engineering Howto2,662
7 months ago5
A list of useful resources to learn Data Engineering from scratch
Serpytor18
8 months ago5mitPython
A distributed, low-code, end-to-end data collection and analysis tool for data folks. Take the pain out of data collection from your pipeline!
Bridgefour16
2 months agoScala
Bridge Four is a simple, functional, effectful, single-leader, multi worker, distributed compute system optimized for embarrassingly parallel workloads.
Dagger912 years ago11September 30, 2021apache-2.0Python
Define sophisticated data pipelines with Python and run them on different distributed systems (such as Argo Workflows).
Sparklyclean6
3 years agomitScala
Optimal distributed data deduplication and supervised learning pipeline using Apache Spark
Alternatives To Data Engineering Howto
Select To Compare


Alternative Project Comparisons
Readme

How To Become a Data Engineer

Useful articles

Talks

Algorithms & Data Structures

SQL

Programming

Databases

Distributed Systems

Books

Courses

Blogs

  • Martin Kleppmann author of Designing Data-Intensive Application
  • BaseDS by Vaidehi Joshi about Distributed Systems

Tools

  • Apache Airflow is a platform to programmatically author, schedule and monitor workflows in Python
  • Apache Spark is a unified analytics engine for large-scale data processing
  • Apache Kafka is a distributed streaming platform
  • Luigi is a Python package that helps you build complex pipelines of batch jobs.
  • Dagster.io is a system for building modern data applications.
  • Prefect includes everything you need to create and run data applications.
  • Metaflow build and manage real-life data science projects with ease
  • lakeFS build repeatable, atomic and versioned data lake operations – from complex ETL jobs to data science and analytics.

Cloud Platforms

Communities

Data Engineering Jobs

Other

Newsletters & Digests

Popular Data Engineering Projects
Popular Distributed Systems Projects
Popular Data Processing Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Scala
Distributed Systems
Airflow
Data Engineering