Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for data engineering
data-engineering
x
515 search results found
Superset
⭐
58,051
Apache Superset is a Data Visualization and Data Exploration Platform
Made With Ml
⭐
35,496
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Airflow
⭐
34,299
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Applied Ml
⭐
24,828
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Data Engineering Zoomcamp
⭐
19,461
Free Data Engineering course!
Prefect
⭐
14,339
Prefect is a workflow orchestration tool empowering developers to build, observe, and react to data pipelines
Argo Workflows
⭐
14,264
Workflow Engine for Kubernetes
Airbyte
⭐
12,918
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Cookbook
⭐
12,557
The Data Engineering Cookbook
Dagster
⭐
9,467
An orchestration platform for the development, production, and observation of data assets.
Great_expectations
⭐
9,179
Always know what to expect from your data.
Data Engineer Roadmap
⭐
9,131
Roadmap to becoming a data engineer in 2021
Benthos
⭐
7,407
Fancy stream processing made operationally mundane
Mage Ai
⭐
6,324
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Risingwave
⭐
5,799
The distributed streaming database. Engineered to offer the simplest and most cost-efficient way for stream processing and management.
Data Engineer Handbook
⭐
5,650
This is a repo with links to everything you'd ever want to learn about data engineering
Cloudquery
⭐
5,380
The open source high performance data integration platform built for developers.
Growthbook
⭐
5,285
Open Source Feature Flagging and A/B Testing Platform
Kestra
⭐
5,257
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
Feast
⭐
5,053
Feature Store for Machine Learning
Taipy
⭐
4,311
Turns Data and AI algorithms into production-ready web applications in no time.
Lakefs
⭐
3,900
lakeFS - Data version control for your data lake | Git for data
Sql Translator
⭐
3,842
SQL Translator is a tool for converting natural language queries into SQL code using artificial intelligence. This project is 100% free and open source.
Aws Sdk Pandas
⭐
3,779
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Openmetadata
⭐
3,512
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
God Level Data Science Ml Full Stack
⭐
3,384
A collection of scientific methods, processes, algorithms, and systems to build stories & models. Whether you are a fresher in the field or an experienced professional who wants to transition into Data Science & AI
Ploomber
⭐
3,318
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
Memphis
⭐
3,078
Memphis.dev is a highly scalable and effortless data streaming platform
Data Engineering Howto
⭐
2,949
A list of useful resources to learn Data Engineering from scratch
Evidence
⭐
2,776
Business intelligence as code: build fast, interactive data visualizations in pure SQL and markdown..
Data Diff
⭐
2,707
Compare tables within or across databases
Quadratic
⭐
2,485
Quadratic | Data Science Spreadsheet with Python & SQL
Data Science Roadmap
⭐
2,445
Data Science Roadmap from A to Z
Mlops Course
⭐
2,427
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Incubator Devlake
⭐
2,322
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
Qsv
⭐
2,079
CSVs sliced, diced & analyzed.
Metarank
⭐
1,949
A low code Machine Learning personalized ranking service for articles, listings, search results, recommendations that boosts user engagement. A friendly Learn-to-Rank engine
Feathr
⭐
1,886
Feathr – A scalable, unified data and AI engineering platform for enterprise
Soda Core
⭐
1,644
⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
Datart
⭐
1,593
Datart is a next generation Data Visualization Open Platform
Just Dashboard
⭐
1,489
📊 📋 Dashboards using YAML or JSON files
Meltano
⭐
1,460
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Udacity Data Engineering Projects
⭐
1,335
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Awesome Opensource Data Engineering
⭐
1,331
An Awesome List of Open-Source Data Engineering Projects
Quilt
⭐
1,299
Quilt is a data mesh for connecting people with actionable data
Hamilton
⭐
1,272
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
Pyjanitor
⭐
1,267
Clean APIs for data cleaning. Python implementation of R package Janitor
Data Science On Gcp
⭐
1,249
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Mlrun
⭐
1,177
Machine Learning automation and tracking
Yt Channels Ds Ai Ml Cs
⭐
1,084
A comprehensive list of 180+ YouTube Channels for Data Science, Data Engineering, Machine Learning, Deep learning, Computer Science, programming, software engineering, etc.
Dlt
⭐
1,069
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
Odd Platform
⭐
1,047
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Pyspark Example Project
⭐
1,034
Example project implementing best practices for PySpark ETL jobs and applications.
Data Engineering
⭐
1,012
Getting Started with Data Enngineering
Daft
⭐
1,012
Distributed DataFrame for Python designed for the cloud, powered by Rust
Bytewax
⭐
957
Python Stream Processing
Data Engineering Wiki
⭐
934
The best place to learn data engineering. Built and maintained by the data engineering community.
Sqlmesh
⭐
931
SQLMesh is a data transformation framework that brings the benefits of DevOps to data teams. It enables data scientists, analysts, and engineers to efficiently run and deploy data transformations written in SQL or Python.
Around Dataengineering
⭐
926
A Data Engineering & Machine Learning Knowledge Hub
Data Centric Ai
⭐
892
A curated, but incomplete, list of data-centric AI resources.
Hamilton
⭐
877
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
Zingg
⭐
828
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Awesome Dbt
⭐
813
A curated list of awesome dbt resources
Yobulkdev
⭐
786
🔥 🔥 🔥Open Source & AI driven Data Onboarding Platform:Free flatfile.com alternative
Blaze
⭐
784
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
Dataform
⭐
757
Dataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Data Engineering Book
⭐
744
Accumulated knowledge and experience in the field of Data Engineering
Egeria
⭐
742
Egeria core
Awesome Billing
⭐
711
💰 Billing & Payments knowledge for cloud platforms
Neumai
⭐
693
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
Active_workflow
⭐
675
Polyglot workflows without leaving the comfort of your technology stack.
Flyfish
⭐
651
FlyFish is a data visualization coding platform. We can create a data model quickly in a simple way, and quickly generate a set of data visualization solutions by dragging.
Dataengineeringproject
⭐
644
Example end to end data engineering project.
Goodreads_etl_pipeline
⭐
593
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Failed Ml
⭐
585
Compilation of high-profile real-world examples of failed machine learning projects
Vectorflow
⭐
566
VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.
Bacalhau
⭐
555
Compute over Data framework for public, transparent, and optionally verifiable computation
Data Engineering Interview Questions
⭐
554
More than 2000+ Data engineer interview questions.
Bigfunctions
⭐
490
Supercharge BigQuery with BigFunctions
Redun
⭐
464
Yet another redundant workflow engine
Awesome Data Catalogs
⭐
441
📙 Awesome Data Catalogs and Observability Platforms.
Automate Dv
⭐
435
A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
Mlcraft
⭐
418
Synmetrix – open source semantic layer / Boost your LLM precision
Learn Something Every Day
⭐
409
📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->
Versatile Data Kit
⭐
389
One framework to develop, deploy and operate data workflows with Python and SQL.
Aws Serverless Data Lake Framework
⭐
379
Enterprise-grade, production-hardened, serverless data lake on AWS
Ethereum Etl Airflow
⭐
378
Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethe
Everything Tech
⭐
372
A collection of online resources to help you on your Tech journey.
Gspread Pandas
⭐
371
A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
Bitcoin Etl
⭐
350
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Data Engineering On Gcp Cheatsheet
⭐
330
Etl
⭐
327
PHP - ETL (Extract Transform Load) data processing library
Data Engineering Projects
⭐
322
Personal Data Engineering Projects
Awesome Bigquery Views
⭐
322
Useful SQL queries for Blockchain ETL datasets in BigQuery.
Conduit
⭐
321
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
Cascading
⭐
321
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on various cluster computing platforms. Please see https://github.com/cwensel/cascading for access to all WIP branches.
Every Single Day I Tldr
⭐
311
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Data Engineering With Python
⭐
302
Data Engineering with Python, published by Packt
Dataengineering Roadmap
⭐
297
Un repositorio más con conceptos básicos, desafíos técnicos y recursos sobre ingeniería de datos en español 🧙✨
Tum
⭐
297
Notes, material and various stuff collected while attended TUM Master's Degree
1-100 of 515 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.