Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python data engineering
data-engineering
x
python
x
303 search results found
Superset
⭐
52,203
Apache Superset is a Data Visualization and Data Exploration Platform
Made With Ml
⭐
33,193
Learn how to responsibly develop, deploy and maintain production machine learning applications.
Prefect
⭐
12,055
The easiest way to orchestrate and observe your data pipelines
Airbyte
⭐
10,760
Data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes.
Great_expectations
⭐
8,409
Always know what to expect from your data.
Dagster
⭐
7,539
An orchestration platform for the development, production, and observation of data assets.
Mage Ai
⭐
4,735
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Feast
⭐
4,339
Feature Store for Machine Learning
Aws Sdk Pandas
⭐
3,466
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Ploomber
⭐
3,078
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
God Level Data Science Ml Full Stack
⭐
2,545
A collection of scientific methods, processes, algorithms, and systems to build stories & models. This roadmap contains 16 Chapters, whether you are a fresher in the field or an experienced professional who wants to transition into Data Science & AI
Data Diff
⭐
2,336
Compare tables within or across databases
Quadratic
⭐
1,995
Quadratic | Data Science Spreadsheet with Python & SQL
Data Science Roadmap
⭐
1,462
Data Science Roadmap from A to Z
Soda Core
⭐
1,348
⚡️ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
Quilt
⭐
1,250
Quilt is a data mesh for connecting people with actionable data
Pyjanitor
⭐
1,146
Clean APIs for data cleaning. Python implementation of R package Janitor
Yt Channels Ds Ai Ml Cs
⭐
1,084
A comprehensive list of 180+ YouTube Channels for Data Science, Data Engineering, Machine Learning, Deep learning, Computer Science, programming, software engineering, etc.
Pyspark Example Project
⭐
1,034
Example project implementing best practices for PySpark ETL jobs and applications.
Mlrun
⭐
977
Machine Learning automation and tracking
Meltano
⭐
968
Extract & Load with joy — CLI & version control for ELT without limitations. No more black box. Let your creativity flow.
Around Dataengineering
⭐
926
A Data Engineering & Machine Learning Knowledge Hub
Hamilton
⭐
894
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
Data Engineering
⭐
889
Getting Started with Data Enngineering
Dataengineeringproject
⭐
644
Example end to end data engineering project.
Hamilton
⭐
623
A scalable general purpose micro-framework for defining dataflows. You can use it to build dataframes, numpy matrices, python objects, ML models, etc. Embed Hamilton anywhere python runs, e.g. spark, airflow, jupyter, fastapi, python scripts, etc. Comes with lineage out of the box.
Goodreads_etl_pipeline
⭐
593
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Daft
⭐
562
The Python DataFrame for Complex Data
Bytewax
⭐
475
Python Stream Processing
Bigfunctions
⭐
374
Supercharge BigQuery with BigFunctions
Everything Tech
⭐
372
A collection of online resources to help you on your Tech journey.
Ethereum Etl Airflow
⭐
349
Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethe
Aws Serverless Data Lake Framework
⭐
338
Enterprise-grade, production-hardened, serverless data lake on AWS
Versatile Data Kit
⭐
336
Build, run and manage your data pipelines with Python or SQL on any cloud
Gspread Pandas
⭐
332
A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
Bitcoin Etl
⭐
305
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Data Engineering With Python
⭐
302
Data Engineering with Python, published by Packt
Jupysql
⭐
261
Better SQL in Jupyter. 📊
Taipy
⭐
260
Turns Data and AI algorithms into full web applications in no time.
Butterfree
⭐
249
A tool for building feature stores.
Elastik Nearest Neighbors
⭐
242
Go to: https://github.com/alexklibisz/elastiknn
Recap
⭐
239
Recap tracks and transform schemas across your whole application.
Pipelinex
⭐
198
PipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Feathub
⭐
195
FeatHub - A stream-batch unified feature store for real-time machine learning
Auptimizer
⭐
195
An automatic ML model optimization tool.
Waylonwalker
⭐
190
Learning in public
Pureml
⭐
174
Developer platform for production ML.
Snowpark Python Demos
⭐
166
This repository provides various demos/examples of using Snowpark for Python.
Aws Ddk
⭐
157
An open source development framework to help you build data workflows and modern data architecture on AWS.
Snowpark Python
⭐
153
Snowflake Snowpark Python API
Accelerator
⭐
150
The Accelerator is a tool for fast and reproducible processing of large amounts of data.
Dbt Sugar
⭐
143
dbt-sugar is a CLI tool that allows users of dbt to have fun and ease performing actions around dbt models
Dbt Sqlserver
⭐
138
dbt adapter for SQL Server and Azure SQL
Dbt Trino
⭐
135
The Trino (https://trino.io/) adapter plugin for dbt (https://getdbt.com)
Aws Orbit Workbench
⭐
123
A Data Platform built for AWS, powered by Kubernetes.
Machine Learning
⭐
121
Public Datasets Pipelines
⭐
121
Cloud-native, data onboarding architecture for Google Cloud Datasets
Airflow Dbt Python
⭐
114
A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
Sayn
⭐
113
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Grai Core
⭐
113
Eyes
⭐
110
Public Opinion Mining System of Taiwanese Forums
Morph Kgc
⭐
104
Powerful RDF Knowledge Graph Generation with [R2]RML Mappings
Patterns Devkit
⭐
101
Data pipelines from re-usable components
Pangeo Forge Recipes
⭐
98
Python library for building Pangeo Forge recipes.
Streamify
⭐
97
A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!
Meerschaum
⭐
85
Create and manage data pipes with Meerschaum.
Dlt
⭐
84
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
Viewflow
⭐
84
Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.
Polygon Etl
⭐
81
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Dataflow Ops
⭐
81
Project demonstrating how to automate Prefect 2.0 deployments to AWS ECS Fargate
Cauldron
⭐
77
Interactive computing for complex data processing, modeling and analysis in Python 3
Movalytics Data Warehouse
⭐
74
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
Magniv Core
⭐
67
Magniv Core - A Python-decorator based job orchestration platform. Avoid responsibility handoffs by abstracting infra and DevOps.
Awesome Python Backend
⭐
66
Index for online reading materials in order to learn Python and backend development/engineering concepts from scratch and develop a mastery sufficient for Senior/Principal Backend Engineers and Data Engineers
Soorgeon
⭐
65
Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊
Beneath
⭐
64
Beneath is a serverless real-time data platform ⚡️
Ansible Playbook
⭐
59
Ansible playbook to deploy distributed technologies
Apachespark
⭐
59
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Rony
⭐
56
Data Engineering made simple - An opinionated Data Engineering framework
Awesome Dataops
⭐
56
😎 A curated list of awesome DataOps tools
Cowait
⭐
54
Containerized distributed programming framework for Python
Coursera_ml_da_specialization
⭐
53
Coursera Specialization: Machine Learning and Data Analysis (Yandex & MIPT)
Doltpy
⭐
53
A Python API for Dolt
Towardsdataengineering
⭐
52
This repo contains commands that data engineers use in day to day work.
Mz Hack Day 2022
⭐
51
Official repo for the Materialize + Redpanda + dbt Hack Day 2022, including a sample project to get everyone started!
Soda Spark
⭐
49
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Prefect Deployment Patterns
⭐
48
Code examples showing flow deployment to various types of infrastructure
Uptasticsearch
⭐
47
An Elasticsearch client tailored to data science workflows.
Sageworks
⭐
46
SageWorks: An easy to use Python API for creating and deploying SageMaker Models
Drivers
⭐
46
🏎 The python library enabling access to tools and data sources in minutes, with Naas low-code formulas.
Ia Z
⭐
44
Dépôt pour le cours d'IA par la communauté @DefendIntelligence.
Work At Olist Data
⭐
41
Apply for a job at Olist's Data Team: https://olist.gupy.io/
Ibmdataengineeringcoursera
⭐
40
IBM Data Engineering Courses from Coursera
Prodmodel
⭐
39
Build, test, deploy, iterate - Dev and prod tool for data science pipelines
Ml In Production
⭐
39
The practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.
Phidata
⭐
38
Toolkit for building AI Applications
Prefect Dataplatform
⭐
37
Example repository showing how to build a data platform with Prefect, dbt and Snowflake
Uber Expenses Tracking
⭐
35
The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.
Stairs
⭐
35
Framework which helps you to make parallel/distributed calculations using data pipelines
Hive Metastore Client
⭐
32
A client for connecting and running DDLs on hive metastore.
Related Searches
Python Ml (20,195)
Python Jupyter (17,055)
Python Flask (15,633)
Python Dataset (14,792)
Python Pytorch (14,667)
Python Docker (14,113)
Python Machine Learning (14,099)
Python Deep Learning (13,092)
Python Jupyter Notebook (12,976)
Python Databases (10,072)
1-100 of 303 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2023 Awesome Open Source. All rights reserved.