Awesome Open Source

Programming Languages

Search results for python data engineering

data-engineering x

246 search results found

Superset ⭐ 58,051

Apache Superset is a Data Visualization and Data Exploration Platform

Made With Ml ⭐ 35,496

Learn how to design, develop, deploy and iterate on production-grade ML applications.

Airflow ⭐ 34,299

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Prefect ⭐ 14,339

Prefect is a workflow orchestration tool empowering developers to build, observe, and react to data pipelines

Airbyte ⭐ 12,918

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

Dagster ⭐ 9,467

An orchestration platform for the development, production, and observation of data assets.

Great_expectations ⭐ 9,179

Always know what to expect from your data.

Mage Ai ⭐ 6,324

🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.

Feast ⭐ 5,053

Feature Store for Machine Learning

Taipy ⭐ 4,311

Turns Data and AI algorithms into production-ready web applications in no time.

Aws Sdk Pandas ⭐ 3,779

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

God Level Data Science Ml Full Stack ⭐ 3,384

A collection of scientific methods, processes, algorithms, and systems to build stories & models. Whether you are a fresher in the field or an experienced professional who wants to transition into Data Science & AI

Ploomber ⭐ 3,318

The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️

Data Diff ⭐ 2,707

Compare tables within or across databases

Quadratic ⭐ 2,485

Quadratic | Data Science Spreadsheet with Python & SQL

Data Science Roadmap ⭐ 2,445

Data Science Roadmap from A to Z

Mlops Course ⭐ 2,427

Learn how to design, develop, deploy and iterate on production-grade ML applications.

Soda Core ⭐ 1,644

⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

Meltano ⭐ 1,460

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

Quilt ⭐ 1,299

Quilt is a data mesh for connecting people with actionable data

Hamilton ⭐ 1,272

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.

Mlrun ⭐ 1,177

Machine Learning automation and tracking

Yt Channels Ds Ai Ml Cs ⭐ 1,084

A comprehensive list of 180+ YouTube Channels for Data Science, Data Engineering, Machine Learning, Deep learning, Computer Science, programming, software engineering, etc.

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

Pyspark Example Project ⭐ 1,034

Example project implementing best practices for PySpark ETL jobs and applications.

Distributed DataFrame for Python designed for the cloud, powered by Rust

Data Engineering ⭐ 1,012

Getting Started with Data Enngineering

Bytewax ⭐ 957

Python Stream Processing

Sqlmesh ⭐ 931

SQLMesh is a data transformation framework that brings the benefits of DevOps to data teams. It enables data scientists, analysts, and engineers to efficiently run and deploy data transformations written in SQL or Python.

Around Dataengineering ⭐ 926

A Data Engineering & Machine Learning Knowledge Hub

Hamilton ⭐ 877

A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton

Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.

Dataengineeringproject ⭐ 644

Example end to end data engineering project.

Goodreads_etl_pipeline ⭐ 593

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Vectorflow ⭐ 566

VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.

Bigfunctions ⭐ 490

Supercharge BigQuery with BigFunctions

Yet another redundant workflow engine

Versatile Data Kit ⭐ 389

One framework to develop, deploy and operate data workflows with Python and SQL.

Aws Serverless Data Lake Framework ⭐ 379

Enterprise-grade, production-hardened, serverless data lake on AWS

Ethereum Etl Airflow ⭐ 378

Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethe

Everything Tech ⭐ 372

A collection of online resources to help you on your Tech journey.

Gspread Pandas ⭐ 371

A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.

Bitcoin Etl ⭐ 350

ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ

Data Engineering With Python ⭐ 302

Data Engineering with Python, published by Packt

Work with your web service, database, and streaming schemas in a single format.

Butterfree ⭐ 269

A tool for building feature stores.

Jupysql ⭐ 261

Better SQL in Jupyter. 📊

Feathub ⭐ 255

FeatHub - A stream-batch unified feature store for real-time machine learning

Grai Core ⭐ 254

Elastik Nearest Neighbors ⭐ 242

Go to: https://github.com/alexklibisz/elastiknn

Aws Ddk ⭐ 233

An open source development framework to help you build data workflows and modern data architecture on AWS.

Snowpark Python Demos ⭐ 220

This repository provides various demos/examples of using Snowpark for Python.

Phidata ⭐ 220

Build AI Assistants using function calling

Snowpark Python ⭐ 215

Snowflake Snowpark Python API

Pipelinex ⭐ 212

PipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more

Machine Learning ⭐ 206

Learn AI together, for free. AI learning and teaching resources for everyone.

Auptimizer ⭐ 195

An automatic ML model optimization tool.

Waylonwalker ⭐ 190

Learning in public

Interview Process Coding Questions ⭐ 184

Interview coding questions and experiences for several companies merged into one repository

Developer platform for production ML.

Dbt Trino ⭐ 172

The Trino (https://trino.io/) adapter plugin for dbt (https://getdbt.com)

Dbt Sqlserver ⭐ 170

dbt adapter for SQL Server and Azure SQL

Data Science For Everyone ⭐ 160

Data Science boot camp aims to make the field of data science accessible and understandable to a wide range of individuals, regardless of their background or expertise.

Lakehouse Engine ⭐ 154

The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.

Snowflake infrastructure-as-code. Provision environments, automate deploys, CI/CD. Manage RBAC, users, roles, and data access. Declarative Python Resource API. Change Management tool for the Snowflake data warehouse.

Dbt Sugar ⭐ 143

dbt-sugar is a CLI tool that allows users of dbt to have fun and ease performing actions around dbt models

Morph Kgc ⭐ 143

Powerful RDF Knowledge Graph Generation with RML Mappings

Airflow Dbt Python ⭐ 139

A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.

Public Datasets Pipelines ⭐ 131

Cloud-native, data onboarding architecture for Google Cloud Datasets

Datachecks ⭐ 117

Open Source Data Quality Monitoring.

Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).

Movalytics Data Warehouse ⭐ 114

Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow

Meerschaum ⭐ 112

Create and manage data pipes with Meerschaum.

Public Opinion Mining System of Taiwanese Forums

Pangeo Forge Recipes ⭐ 108

Python library for building Pangeo Forge recipes.

De Zoomcamp Ui ⭐ 107

🎨 UI for the Free Data Engineering Zoomcamp 2023 Course provided by DataTalksClub

Patterns Devkit ⭐ 101

Data pipelines from re-usable components

Special Topic Data Engineering ⭐ 98

This course presents to the students recent research and industrial issues pertaining to data engineering, database systems and technologies. Various topics of interests that are directly or indirectly affecting or are being influenced by data engineering, database systems and technologies are explored and discussed.

Streamify ⭐ 97

A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!

Dataflow Ops ⭐ 97

Project demonstrating how to automate Prefect 2.0 deployments to AWS ECS Fargate

Polygon Etl ⭐ 93

ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub

Viewflow ⭐ 84

Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.

Airbyte_serverless ⭐ 83

Airbyte made simple (no UI, no database, no cluster)

Dataengineeringpilipinas ⭐ 80

Data Engineering Pilipinas is a community for data engineers, data analysts, data scientists, developers, AI / ML engineers, and users of closed and open source data tools and methods / techniques in the Philippines. Data Engineering Pilipinas is a PyData group.

Awesome Dataops ⭐ 78

😎 A curated list of awesome DataOps tools

Cauldron ⭐ 77

Interactive computing for complex data processing, modeling and analysis in Python 3

Soorgeon ⭐ 73

Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊

Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python.

Magniv Core ⭐ 67

Magniv Core - A Python-decorator based job orchestration platform. Avoid responsibility handoffs by abstracting infra and DevOps.

Awesome Python Backend ⭐ 66

Index for online reading materials in order to learn Python and backend development/engineering concepts from scratch and develop a mastery sufficient for Senior/Principal Backend Engineers and Data Engineers

Beneath is a serverless real-time data platform ⚡️

Ansible Playbook ⭐ 59

Ansible playbook to deploy distributed technologies

Apachespark ⭐ 59

This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.

Data Engineering made simple - An opinionated Data Engineering framework

Containerized distributed programming framework for Python

Coursera_ml_da_specialization ⭐ 53

Coursera Specialization: Machine Learning and Data Analysis (Yandex & MIPT)

Low-code Python library enabling access to APIs, tools, data sources in seconds.

A Python API for Dolt

Towardsdataengineering ⭐ 52

This repo contains commands that data engineers use in day to day work.

Mz Hack Day 2022 ⭐ 51

Official repo for the Materialize + Redpanda + dbt Hack Day 2022, including a sample project to get everyone started!

Related Searches

Python Machine Learning (20,195)

Python Flask (17,643)

Python Jupyter Notebook (17,055)

Python Dataset (14,792)

Python Docker (14,113)

Python Deep Learning (13,092)

Python Database (10,521)

Python Natural Language Processing (9,064)

Python Artificial Intelligence (8,580)

Python Amazon Web Services (7,946)

1-100 of 246 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.