Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Beam | 7,355 | 14 | 2 months ago | 568 | November 13, 2023 | 4,327 | apache-2.0 | Java | ||
Apache Beam is a unified programming model for Batch and Streaming data processing. | ||||||||||
Pachyderm | 6,035 | 1 | 2 months ago | 613 | December 04, 2023 | 897 | apache-2.0 | Go | ||
Data-Centric Pipelines and Data Versioning | ||||||||||
Dataflowjavasdk | 853 | 249 | 14 | 3 years ago | 38 | June 26, 2018 | 54 | |||
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines. | ||||||||||
Tez | 446 | 2 months ago | 67 | apache-2.0 | Java | |||||
Apache Tez | ||||||||||
Smooks | 377 | 14 | 2 months ago | 5 | June 19, 2023 | 19 | other | Java | ||
Extensible data integration Java framework for building XML and non-XML fragment-based applications | ||||||||||
Dataengineering Roadmap | 297 | 3 months ago | mit | |||||||
Un repositorio más con conceptos básicos, desafíos técnicos y recursos sobre ingeniería de datos en español 🧙✨ | ||||||||||
Big Data Rosetta Code | 283 | 5 months ago | 5 | apache-2.0 | Scala | |||||
Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code | ||||||||||
Shifu | 235 | 1 | 2 | a year ago | 9 | April 03, 2019 | 237 | apache-2.0 | Java | |
An end-to-end machine learning and data mining framework on Hadoop | ||||||||||
Mobydq | 175 | 2 years ago | 5 | apache-2.0 | Vue | |||||
:whale: Tool to automate data quality checks on data pipelines | ||||||||||
Setl | 173 | 4 months ago | 4 | August 21, 2020 | 5 | apache-2.0 | Scala | |||
A simple Spark-powered ETL framework that just works 🍺 |