Awesome Open Source
Search results for python data engineering
303 search results found
Apache Superset is a Data Visualization and Data Exploration Platform
Made With Ml
Learn how to responsibly develop, deploy and maintain production machine learning applications.
The easiest way to orchestrate and observe your data pipelines
Data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes.
Always know what to expect from your data.
An orchestration platform for the development, production, and observation of data assets.
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Feature Store for Machine Learning
Aws Sdk Pandas
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
God Level Data Science Ml Full Stack
A collection of scientific methods, processes, algorithms, and systems to build stories & models. This roadmap contains 16 Chapters, whether you are a fresher in the field or an experienced professional who wants to transition into Data Science & AI
Compare tables within or across databases
Quadratic | Data Science Spreadsheet with Python & SQL
Data Science Roadmap
Data Science Roadmap from A to Z
⚡️ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
Quilt is a data mesh for connecting people with actionable data
Clean APIs for data cleaning. Python implementation of R package Janitor
Yt Channels Ds Ai Ml Cs
A comprehensive list of 180+ YouTube Channels for Data Science, Data Engineering, Machine Learning, Deep learning, Computer Science, programming, software engineering, etc.
Pyspark Example Project
Example project implementing best practices for PySpark ETL jobs and applications.
Machine Learning automation and tracking
Extract & Load with joy — CLI & version control for ELT without limitations. No more black box. Let your creativity flow.
A Data Engineering & Machine Learning Knowledge Hub
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
Getting Started with Data Enngineering
Example end to end data engineering project.
A scalable general purpose micro-framework for defining dataflows. You can use it to build dataframes, numpy matrices, python objects, ML models, etc. Embed Hamilton anywhere python runs, e.g. spark, airflow, jupyter, fastapi, python scripts, etc. Comes with lineage out of the box.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
The Python DataFrame for Complex Data
Python Stream Processing
Supercharge BigQuery with BigFunctions
A collection of online resources to help you on your Tech journey.
Ethereum Etl Airflow
Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethe
Aws Serverless Data Lake Framework
Enterprise-grade, production-hardened, serverless data lake on AWS
Versatile Data Kit
Build, run and manage your data pipelines with Python or SQL on any cloud
A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Data Engineering With Python
Data Engineering with Python, published by Packt
Better SQL in Jupyter. 📊
Turns Data and AI algorithms into full web applications in no time.
A tool for building feature stores.
Elastik Nearest Neighbors
Go to: https://github.com/alexklibisz/elastiknn
Recap tracks and transform schemas across your whole application.
PipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
FeatHub - A stream-batch unified feature store for real-time machine learning
An automatic ML model optimization tool.
Learning in public
Developer platform for production ML.
Snowpark Python Demos
This repository provides various demos/examples of using Snowpark for Python.
An open source development framework to help you build data workflows and modern data architecture on AWS.
Snowflake Snowpark Python API
The Accelerator is a tool for fast and reproducible processing of large amounts of data.
dbt-sugar is a CLI tool that allows users of dbt to have fun and ease performing actions around dbt models
dbt adapter for SQL Server and Azure SQL
The Trino (https://trino.io/) adapter plugin for dbt (https://getdbt.com)
Aws Orbit Workbench
A Data Platform built for AWS, powered by Kubernetes.
Public Datasets Pipelines
Cloud-native, data onboarding architecture for Google Cloud Datasets
Airflow Dbt Python
A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Public Opinion Mining System of Taiwanese Forums
Powerful RDF Knowledge Graph Generation with [R2]RML Mappings
Data pipelines from re-usable components
Pangeo Forge Recipes
Python library for building Pangeo Forge recipes.
A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!
Create and manage data pipes with Meerschaum.
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Project demonstrating how to automate Prefect 2.0 deployments to AWS ECS Fargate
Interactive computing for complex data processing, modeling and analysis in Python 3
Movalytics Data Warehouse
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
Magniv Core - A Python-decorator based job orchestration platform. Avoid responsibility handoffs by abstracting infra and DevOps.
Awesome Python Backend
Index for online reading materials in order to learn Python and backend development/engineering concepts from scratch and develop a mastery sufficient for Senior/Principal Backend Engineers and Data Engineers
Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊
Beneath is a serverless real-time data platform ⚡️
Ansible playbook to deploy distributed technologies
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Data Engineering made simple - An opinionated Data Engineering framework
😎 A curated list of awesome DataOps tools
Containerized distributed programming framework for Python
Coursera Specialization: Machine Learning and Data Analysis (Yandex & MIPT)
A Python API for Dolt
This repo contains commands that data engineers use in day to day work.
Mz Hack Day 2022
Official repo for the Materialize + Redpanda + dbt Hack Day 2022, including a sample project to get everyone started!
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Prefect Deployment Patterns
Code examples showing flow deployment to various types of infrastructure
An Elasticsearch client tailored to data science workflows.
SageWorks: An easy to use Python API for creating and deploying SageMaker Models
🏎 The python library enabling access to tools and data sources in minutes, with Naas low-code formulas.
Dépôt pour le cours d'IA par la communauté @DefendIntelligence.
Work At Olist Data
Apply for a job at Olist's Data Team: https://olist.gupy.io/
IBM Data Engineering Courses from Coursera
Build, test, deploy, iterate - Dev and prod tool for data science pipelines
Ml In Production
The practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.
Toolkit for building AI Applications
Example repository showing how to build a data platform with Prefect, dbt and Snowflake
Uber Expenses Tracking
The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.
Framework which helps you to make parallel/distributed calculations using data pipelines
Hive Metastore Client
A client for connecting and running DDLs on hive metastore.
Python Ml (20,195)
Python Jupyter (17,055)
Python Flask (15,633)
Python Dataset (14,792)
Python Pytorch (14,667)
Python Docker (14,113)
Python Machine Learning (14,099)
Python Deep Learning (13,092)
Python Jupyter Notebook (12,976)
Python Databases (10,072)
1-100 of 303 search results
Follow Us On Twitter
Copyright 2018-2023 Awesome Open Source. All rights reserved.