Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python data engineering
data-engineering
x
python
x
247 search results found
Mz Hack Day 2022
⭐
51
Official repo for the Materialize + Redpanda + dbt Hack Day 2022, including a sample project to get everyone started!
Soda Spark
⭐
49
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Uptasticsearch
⭐
48
An Elasticsearch client tailored to data science workflows.
Prefect Deployment Patterns
⭐
48
Code examples showing flow deployment to various types of infrastructure
Ia Z
⭐
47
Dépôt pour le cours d'IA par la communauté @DefendIntelligence.
Portable Data Stack Dagster
⭐
45
A portable Datamart and Business Intelligence suite built with Docker, Dagster, dbt, DuckDB, PostgreSQL and Superset
Work At Olist Data
⭐
41
Apply for a job at Olist's Data Team: https://olist.gupy.io/
Ibmdataengineeringcoursera
⭐
40
IBM Data Engineering Courses from Coursera
Ml In Production
⭐
39
The practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.
Prodmodel
⭐
39
Build, test, deploy, iterate - Dev and prod tool for data science pipelines
Prefect Dataplatform
⭐
37
Example repository showing how to build a data platform with Prefect, dbt and Snowflake
Amora Data Build Tool
⭐
37
Amora Data Build Tool enables analysts and engineers to transform data on the data warehouse (BigQuery) by writing Amora Models that describe the data schema using Python's "PEP484 - Type Hints" and select statements with SQLAlchemy. Amora is able to transform Python code into SQL data transformation jobs that run inside the warehouse.
Pyjaws
⭐
36
PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows
Sageworks
⭐
36
SageWorks: An easy to use Python API for creating and deploying SageMaker Models
Uber Expenses Tracking
⭐
35
The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.
Stairs
⭐
35
Framework which helps you to make parallel/distributed calculations using data pipelines
Yaetos
⭐
32
Write data & AI pipelines in (SQL, Spark, Pandas) and deploy to the cloud, simplified
Airflow Pentaho Plugin
⭐
32
Pentaho plugin for Apache Airflow - Orquestate pentaho transformations and jobs from Airflow
Hive Metastore Client
⭐
32
A client for connecting and running DDLs on hive metastore.
Data Machinelearning The Boring Way
⭐
31
Build & Learn Data Engineering,Machine Learning over Kubernetes. No Shortcut approach.
Artificial Intelligence Important Documents Collections
⭐
31
AI technology is significant because it allows software to do human functions—understanding, reasoning, planning, communication, and perception—increasingly effectively, efficiently, and affordably.
Spark Ai
⭐
31
Toolbox for building Generative AI applications on top of Apache Spark.
Verified Sources
⭐
31
Contribute to dlt verified sources 🔥
Debussy_concert
⭐
29
Debussy is an opinionated Data Architecture and Engineering framework, enabling data analysts and engineers to build better platforms and pipelines.
Complete Data Science Roadmap
⭐
29
Complete Roadmap For Data Science
Sparkdataset
⭐
28
Instant search for and access to many datasets in Pyspark.
Frieds.github.io
⭐
26
Tutorials & articles on Python, leetcode problems, pandas, and more.
Uberjob
⭐
26
uberjob is a Python package for building and running call graphs.
Funsies
⭐
25
funsies is a lightweight workflow engine 🔧
Arthur Redshift Etl
⭐
25
ELT Code for your Data Warehouse
Airflow Provider Lakefs
⭐
25
lakeFS airflow operator
Stairlight
⭐
25
A data lineage tool detects table dependencies from rendered SQL statements.
Data Engineering Project
⭐
24
DIT 638 project - Cyber Physical Systems and Systems of Systems, using C++ , Python , Docker File
Audiophile E2e Pipeline
⭐
24
Pipeline that extracts data from Crinacle's Headphone and InEarMonitor databases and finalizes data for a Metabase Dashboard.
E4ds Snippets
⭐
24
Code examples and experiments from https://engineeringfordatascience.com
Get_smarties
⭐
23
Dummy variable generation with fit/transform capabilities
Nodestream
⭐
23
A Fast, Declarative, and Extensible ETL Framework for Graph Databases.
Digital_palace
⭐
23
My Digital Palace - A Personal Journal for Reflection - A place to store all my thoughts
Kumparanian
⭐
23
Data engineering and Data scientist hiring process at scale
Prefect Earthdata
⭐
22
Prefect integrations with NASA Earthdata.
Aws Glue Docker
⭐
22
🐋 Docker image for AWS Glue Spark/Python
Jobanalytics_and_search
⭐
22
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Orangutan Stem
⭐
22
An open-source project dedicated to constructing robust data pipelines and scalable infrastructure. We leverage industry-standard tools favored by data professionals to enhance efficiency and reliability. Uniquely, these pipelines are field-tested on our farm in Sumatra, Indonesia, ensuring real-world applicability and resilience.
Bytehub
⭐
22
ByteHub: making feature stores simple
Etl_manager
⭐
21
A python package to create a database on the platform using our moj data warehousing framework
Spark Movies Etl
⭐
21
Spark data pipeline that ingests and transforms movie ratings data.
Neon Workshop
⭐
20
A Pachyderm deep learning tutorial for conference workshops
Lolpop
⭐
20
A software engineering framework to jump start your machine learning projects
Udacity Data Eng Proj3
⭐
20
Built a stream processing data pipeline to get data from disparate systems into a dashboard using Kafka as an intermediary.
Spotify Api
⭐
19
Pipeline that extracts data from the Spotify API to build a more detailed version of Spotify Wrapped
Airflow Docker
⭐
19
This is my Apache Airflow Local development setup on Windows 10 WSL2/Mac using docker-compose. It will also include some sample DAGs and workflows.
60 Days Of Data Science And Ml
⭐
18
60 Days of Data Science and ML
Serpytor
⭐
18
A distributed, low-code, end-to-end data collection and analysis tool for data folks. Take the pain out of data collection from your pipeline!
Panda_patrol
⭐
18
Kaggle Project List
⭐
18
Summary of my projects on kaggle
Metadata Guardian
⭐
17
Provide an easy way with Python to protect your data sources by searching its metadata.
Dagster Example Pipeline
⭐
17
Template Dagster repo using poetry and a single Docker container; works well with CICD
Data Engineering Salaries
⭐
16
A Streamlit app to explore data engineering salary data.
Airflowetl
⭐
16
Blog post on ETL pipelines with Airflow
Mpc Dl Controller
⭐
16
Deep Neural Network architecture as a predictive optimal controller for {HVAC+Solar cell + battery} disturbance afflicted system compared to classic Model Predictive Control
Airflow Studyclub
⭐
16
Grupo de estudio Apache Airflow organizado por la comunidad Data Engineering Latam
Stepist
⭐
16
Framework for data processing
Airflow Valohai Plugin
⭐
15
🦈 Airflow plugin to scale machine learning tasks with Valohai and get automatic version control
Airflowdatapipeline
⭐
15
Example of an ETL Pipeline using Airflow
Gm Sparql
⭐
15
Graph Mining Using SPARQL
Online_store
⭐
15
End to end data engineering project
Big Data Engineering
⭐
15
Kedro Static Viz
⭐
15
kedro cli plugin for generating a static kedro viz site (html, css, js) that can be deployed on many serverless tools.
Talk Demos
⭐
15
Code & docs for Pipekit's talks
Prefect Saturn
⭐
15
Python client for using Prefect Cloud with Saturn Cloud
Dbt Airflow
⭐
14
A Python package that creates fine-grained dbt tasks on Apache Airflow
Sheetwork
⭐
14
A handy package to load Google Sheets to your database right from the CLI and with easy configuration via YAML files.
Data Engineering Mta Turnstile
⭐
14
Data Engineering - Metropolitan Transportation Authority (MTA) Subway Data Analysis
Automated Data Preprocessing
⭐
14
A command-line utility program for automating the trivial, frequently occurring data preparation tasks: missing value interpolation, outlier removal, and encoding categorical variables.
Data Pipelines With Airflow
⭐
14
Skooldio: Data Pipelines with Airflow
Contessa
⭐
13
Easy way to define, execute and store quality rules for your data.
Pyspark On Aws Emr
⭐
13
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
The Data Science Learning Hub
⭐
12
This repository contains resources and materials for a data science bootcamp. The bootcamp is designed to teach individuals the fundamentals of data science.
Spotify Etl
⭐
12
Spotify ETL Pipeline
Zksync Era Etl
⭐
12
Best zkSync-era ETL ever 😜
Earthmover
⭐
12
CLI tool for transforming collections of tabular source data into a variety of text-based data formats via YAML configuration and Jinja templates.
Dask Saturn
⭐
12
Python library for interacting with Dask clusters in Saturn
Data Engineering
⭐
12
A project portfolio to accompany my resume
Wafers Fault Detection
⭐
12
End to end machine learning project to detects fault in the wafers based on sensor data
Marshmallow Pyspark
⭐
12
Marshmallow serializer integration with pyspark
Ojo_daps_mirror
⭐
12
The Open Jobs Observatory public mirror repo
Airflow Rbac Roles Cli
⭐
12
A tool to create Airflow RBAC roles with dag-level permissions from cli.
Data Paths
⭐
11
Airflowjob
⭐
11
Airflow POC demo : 1) env set up 2) airflow DAG 3) Spark/ML pipeline | #DE
Social Media Analysis
⭐
11
Social Media Analysis, scalable solution, flexible deployment that analyses social media contents
Pai Aws
⭐
11
Data Engineering: Chapter 5 aws chapter for pragmatic ai. Creates an "real world" Data Engineering API using Flask,Click, Pandas and Swagger docs
Moroccanhousing Etl
⭐
11
Moroccan housing data pipeline using scrapy, mongodb , zyte and digitalocean cloud
Proto Schema Parser
⭐
11
A Pure Python Protobuf Parser
Apache Airflow Providers Transfers
⭐
10
Pydbtools
⭐
10
Python version of dbtools
Airflow Docker Metrics
⭐
10
Prefect Planetary Computer
⭐
10
Prefect integrations with Microsoft Planetary Computer.
Greatex
⭐
10
A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in Airflow.
Etl Pipeline Example
⭐
10
An example of an ETL pipeline that lays out or DE processes
Lakefs Hooks
⭐
10
a simple lakeFS webhook for pre-commit and pre-merge validation of data objects
Related Searches
Python Machine Learning (20,195)
Python Flask (17,643)
Python Jupyter Notebook (17,055)
Python Dataset (14,792)
Python Docker (14,113)
Python Deep Learning (13,092)
Python Database (10,521)
Python Natural Language Processing (9,064)
Python Artificial Intelligence (8,580)
Python Amazon Web Services (7,946)
101-200 of 247 search results
< Previous
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.