Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python data engineering
data-engineering
x
python
x
246 search results found
Superset
⭐
58,051
Apache Superset is a Data Visualization and Data Exploration Platform
Made With Ml
⭐
35,496
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Airflow
⭐
34,299
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Prefect
⭐
14,339
Prefect is a workflow orchestration tool empowering developers to build, observe, and react to data pipelines
Airbyte
⭐
12,918
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Dagster
⭐
9,467
An orchestration platform for the development, production, and observation of data assets.
Great_expectations
⭐
9,179
Always know what to expect from your data.
Mage Ai
⭐
6,324
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Feast
⭐
5,053
Feature Store for Machine Learning
Taipy
⭐
4,311
Turns Data and AI algorithms into production-ready web applications in no time.
Aws Sdk Pandas
⭐
3,779
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
God Level Data Science Ml Full Stack
⭐
3,384
A collection of scientific methods, processes, algorithms, and systems to build stories & models. Whether you are a fresher in the field or an experienced professional who wants to transition into Data Science & AI
Ploomber
⭐
3,318
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
Data Diff
⭐
2,707
Compare tables within or across databases
Quadratic
⭐
2,485
Quadratic | Data Science Spreadsheet with Python & SQL
Data Science Roadmap
⭐
2,445
Data Science Roadmap from A to Z
Mlops Course
⭐
2,427
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Soda Core
⭐
1,644
⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
Meltano
⭐
1,460
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Quilt
⭐
1,299
Quilt is a data mesh for connecting people with actionable data
Hamilton
⭐
1,272
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
Mlrun
⭐
1,177
Machine Learning automation and tracking
Yt Channels Ds Ai Ml Cs
⭐
1,084
A comprehensive list of 180+ YouTube Channels for Data Science, Data Engineering, Machine Learning, Deep learning, Computer Science, programming, software engineering, etc.
Dlt
⭐
1,069
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
Pyspark Example Project
⭐
1,034
Example project implementing best practices for PySpark ETL jobs and applications.
Daft
⭐
1,012
Distributed DataFrame for Python designed for the cloud, powered by Rust
Data Engineering
⭐
1,012
Getting Started with Data Enngineering
Bytewax
⭐
957
Python Stream Processing
Sqlmesh
⭐
931
SQLMesh is a data transformation framework that brings the benefits of DevOps to data teams. It enables data scientists, analysts, and engineers to efficiently run and deploy data transformations written in SQL or Python.
Around Dataengineering
⭐
926
A Data Engineering & Machine Learning Knowledge Hub
Hamilton
⭐
877
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
Neumai
⭐
693
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
Dataengineeringproject
⭐
644
Example end to end data engineering project.
Goodreads_etl_pipeline
⭐
593
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Vectorflow
⭐
566
VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.
Bigfunctions
⭐
490
Supercharge BigQuery with BigFunctions
Redun
⭐
464
Yet another redundant workflow engine
Versatile Data Kit
⭐
389
One framework to develop, deploy and operate data workflows with Python and SQL.
Aws Serverless Data Lake Framework
⭐
379
Enterprise-grade, production-hardened, serverless data lake on AWS
Ethereum Etl Airflow
⭐
378
Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethe
Everything Tech
⭐
372
A collection of online resources to help you on your Tech journey.
Gspread Pandas
⭐
371
A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
Bitcoin Etl
⭐
350
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Data Engineering With Python
⭐
302
Data Engineering with Python, published by Packt
Recap
⭐
292
Work with your web service, database, and streaming schemas in a single format.
Butterfree
⭐
269
A tool for building feature stores.
Jupysql
⭐
261
Better SQL in Jupyter. 📊
Feathub
⭐
255
FeatHub - A stream-batch unified feature store for real-time machine learning
Grai Core
⭐
254
Elastik Nearest Neighbors
⭐
242
Go to: https://github.com/alexklibisz/elastiknn
Aws Ddk
⭐
233
An open source development framework to help you build data workflows and modern data architecture on AWS.
Snowpark Python Demos
⭐
220
This repository provides various demos/examples of using Snowpark for Python.
Phidata
⭐
220
Build AI Assistants using function calling
Snowpark Python
⭐
215
Snowflake Snowpark Python API
Pipelinex
⭐
212
PipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Machine Learning
⭐
206
Learn AI together, for free. AI learning and teaching resources for everyone.
Auptimizer
⭐
195
An automatic ML model optimization tool.
Waylonwalker
⭐
190
Learning in public
Interview Process Coding Questions
⭐
184
Interview coding questions and experiences for several companies merged into one repository
Pureml
⭐
174
Developer platform for production ML.
Dbt Trino
⭐
172
The Trino (https://trino.io/) adapter plugin for dbt (https://getdbt.com)
Dbt Sqlserver
⭐
170
dbt adapter for SQL Server and Azure SQL
Data Science For Everyone
⭐
160
Data Science boot camp aims to make the field of data science accessible and understandable to a wide range of individuals, regardless of their background or expertise.
Lakehouse Engine
⭐
154
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
Titan
⭐
152
Snowflake infrastructure-as-code. Provision environments, automate deploys, CI/CD. Manage RBAC, users, roles, and data access. Declarative Python Resource API. Change Management tool for the Snowflake data warehouse.
Dbt Sugar
⭐
143
dbt-sugar is a CLI tool that allows users of dbt to have fun and ease performing actions around dbt models
Morph Kgc
⭐
143
Powerful RDF Knowledge Graph Generation with RML Mappings
Airflow Dbt Python
⭐
139
A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
Public Datasets Pipelines
⭐
131
Cloud-native, data onboarding architecture for Google Cloud Datasets
Datachecks
⭐
117
Open Source Data Quality Monitoring.
Sayn
⭐
117
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Movalytics Data Warehouse
⭐
114
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
Meerschaum
⭐
112
Create and manage data pipes with Meerschaum.
Eyes
⭐
110
Public Opinion Mining System of Taiwanese Forums
Pangeo Forge Recipes
⭐
108
Python library for building Pangeo Forge recipes.
De Zoomcamp Ui
⭐
107
🎨 UI for the Free Data Engineering Zoomcamp 2023 Course provided by DataTalksClub
Patterns Devkit
⭐
101
Data pipelines from re-usable components
Special Topic Data Engineering
⭐
98
This course presents to the students recent research and industrial issues pertaining to data engineering, database systems and technologies. Various topics of interests that are directly or indirectly affecting or are being influenced by data engineering, database systems and technologies are explored and discussed.
Streamify
⭐
97
A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!
Dataflow Ops
⭐
97
Project demonstrating how to automate Prefect 2.0 deployments to AWS ECS Fargate
Polygon Etl
⭐
93
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Viewflow
⭐
84
Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.
Airbyte_serverless
⭐
83
Airbyte made simple (no UI, no database, no cluster)
Dataengineeringpilipinas
⭐
80
Data Engineering Pilipinas is a community for data engineers, data analysts, data scientists, developers, AI / ML engineers, and users of closed and open source data tools and methods / techniques in the Philippines. Data Engineering Pilipinas is a PyData group.
Awesome Dataops
⭐
78
😎 A curated list of awesome DataOps tools
Cauldron
⭐
77
Interactive computing for complex data processing, modeling and analysis in Python 3
Soorgeon
⭐
73
Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊
Prism
⭐
70
Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python.
Magniv Core
⭐
67
Magniv Core - A Python-decorator based job orchestration platform. Avoid responsibility handoffs by abstracting infra and DevOps.
Awesome Python Backend
⭐
66
Index for online reading materials in order to learn Python and backend development/engineering concepts from scratch and develop a mastery sufficient for Senior/Principal Backend Engineers and Data Engineers
Beneath
⭐
64
Beneath is a serverless real-time data platform ⚡️
Ansible Playbook
⭐
59
Ansible playbook to deploy distributed technologies
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Rony
⭐
56
Data Engineering made simple - An opinionated Data Engineering framework
Cowait
⭐
54
Containerized distributed programming framework for Python
Coursera_ml_da_specialization
⭐
53
Coursera Specialization: Machine Learning and Data Analysis (Yandex & MIPT)
Drivers
⭐
53
Low-code Python library enabling access to APIs, tools, data sources in seconds.
Doltpy
⭐
53
A Python API for Dolt
Towardsdataengineering
⭐
52
This repo contains commands that data engineers use in day to day work.
Mz Hack Day 2022
⭐
51
Official repo for the Materialize + Redpanda + dbt Hack Day 2022, including a sample project to get everyone started!
Related Searches
Python Machine Learning (20,195)
Python Flask (17,643)
Python Jupyter Notebook (17,055)
Python Dataset (14,792)
Python Docker (14,113)
Python Deep Learning (13,092)
Python Database (10,521)
Python Natural Language Processing (9,064)
Python Artificial Intelligence (8,580)
Python Amazon Web Services (7,946)
1-100 of 246 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.