Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for pipeline data science
data-science
x
pipeline
x
92 search results found
Prefect
⭐
14,339
Prefect is a workflow orchestration tool empowering developers to build, observe, and react to data pipelines
Dagster
⭐
9,467
An orchestration platform for the development, production, and observation of data assets.
Great_expectations
⭐
9,179
Always know what to expect from your data.
Mage Ai
⭐
6,324
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Pachyderm
⭐
6,035
Data-Centric Pipelines and Data Versioning
Taipy
⭐
4,311
Turns Data and AI algorithms into production-ready web applications in no time.
Orchest
⭐
3,876
Build data pipelines, the easy way 🛠️
Datascienceresources
⭐
3,826
Open Source Data Science Resources.
Zenml
⭐
3,578
ZenML 🙏: Build portable, production-ready MLOps pipelines. https://zenml.io.
Polyaxon
⭐
3,438
MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle
Pipelines
⭐
3,368
Machine Learning Pipelines for Kubeflow
Ploomber
⭐
3,318
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
Marimo
⭐
3,037
A reactive notebook for Python — run reproducible experiments, execute as a script, deploy as an app, and version with git.
Pyfunctional
⭐
2,232
Python library for creating data pipelines with chain functional programming
Mlj.jl
⭐
1,690
A Julia machine learning framework
Mlbox
⭐
1,403
MLBox is a powerful Automated Machine Learning python library.
Drake
⭐
1,332
An R-focused pipeline toolkit for reproducibility and high-performance computing
Mlrun
⭐
1,177
Machine Learning automation and tracking
Sematic
⭐
913
An open-source ML pipeline development platform
Targets
⭐
855
Function-oriented Make-like declarative workflows for R
Dataflowjavasdk
⭐
853
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Lightautoml
⭐
769
LAMA - automatic model creation framework
Pdpipe
⭐
710
Easy pipelines for pandas DataFrames.
Evalml
⭐
679
EvalML is an AutoML library written in python.
Covalent
⭐
608
Pythonic tool for running machine-learning/high performance/quantum-computing workflows in heterogeneous environments.
Baikal
⭐
573
A graph-based functional API for building complex scikit-learn pipelines.
Versatile Data Kit
⭐
389
One framework to develop, deploy and operate data workflows with Python and SQL.
Open Solution Mapping Challenge
⭐
363
Open solution to the Mapping Challenge 🌎
Bodywork Core
⭐
358
ML pipeline orchestration and model deployments on Kubernetes, made really easy.
Melusine
⭐
335
Melusine is a high-level library for emails classification and feature extraction "dédiée aux courriels français".
Automlpipeline.jl
⭐
331
A package that makes it trivial to create and evaluate machine learning pipeline architectures.
Butterfree
⭐
269
A tool for building feature stores.
Naas
⭐
266
Low-code Python library to safely use notebooks in production: schedule workflows, generate assets, trigger webhooks, send notifications, build pipelines, manage secrets (Cloud-only)
Nimbusml
⭐
265
Python machine learning package providing simple interoperability between ML.NET and scikit-learn components.
Chain.jl
⭐
262
A Julia package for piping a value through a series of transformation expressions using a more convenient syntax than Julia's native piping functionality.
Feature Selection For Machine Learning
⭐
248
Code repository for the online course Feature Selection for Machine Learning
Pipelinex
⭐
212
PipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Codeflare
⭐
199
Simplifying the definition and execution, scaling and deployment of pipelines on the cloud.
Toloka Kit
⭐
195
Toloka-Kit is a Python library for working with Toloka API.
Batchflow
⭐
195
BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.
Buildflow
⭐
188
BuildFlow, is an open source framework for building large scale systems using Python. All you need to do is describe where your input is coming from and where your output should be written, and BuildFlow handles the rest. No configuration outside of the code is required.
Pureml
⭐
174
Developer platform for production ML.
Setl
⭐
173
A simple Spark-powered ETL framework that just works 🍺
Dataplane
⭐
171
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.
Open Solution Toxic Comments
⭐
151
Open solution to the Toxic Comment Classification Challenge
Atom
⭐
137
Automated Tool for Optimized Modelling
Steppy
⭐
136
Lightweight, Python library for fast and reproducible experimentation 🔬
Incubator Liminal
⭐
131
Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.
Mlr3pipelines
⭐
127
Dataflow Programming for Machine Learning in R
Open Solution Salt Identification
⭐
120
Open solution to the TGS Salt Identification Challenge
Data_algebra
⭐
112
Codd method-chained SQL generator and Pandas data processing in Python.
Tarchetypes
⭐
108
Archetypes for targets and pipelines
Patterns Devkit
⭐
101
Data pipelines from re-usable components
Pipeline
⭐
101
Pipeline is an open source python SDK for building AI/ML workflows
Tabletransforms.jl
⭐
98
Transforms and pipelines with tabular data in Julia
Dataflow Ops
⭐
97
Project demonstrating how to automate Prefect 2.0 deployments to AWS ECS Fargate
K3ai
⭐
95
K3ai is a lightweight, fully automated, AI infrastructure-in-a-box solution that allows anyone to experiment quickly with Kubeflow pipelines. K3ai is perfect for anything from Edge to laptops.
Targets Tutorial
⭐
90
Short course on the targets R package
Wildebeest
⭐
84
File processing pipelines
Ni
⭐
81
Say "ni" to data of any size
Prism
⭐
70
Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python.
Great_expectations_action
⭐
68
A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.
Timeserio
⭐
61
Better `keras` models for time series and beyond
Pipeliner
⭐
61
Machine learning pipelines for R.
Oreilly Ai K8s Tutorial
⭐
58
Materials for the "AI on Kubernetes" tutorial at O'Reilly AI SF 2018
Drake Examples
⭐
58
Example workflows for the drake R package
Drake
⭐
57
The user manual for the drake R package
Xp
⭐
53
A framework (comand line tool + libraries) for creating flexible compute pipelines
Drivers
⭐
53
Low-code Python library enabling access to APIs, tools, data sources in seconds.
Twde Datalab
⭐
52
Onboarding to data science by ThoughtWorks
Prefect Deployment Patterns
⭐
48
Code examples showing flow deployment to various types of infrastructure
Coronavirus Stats
⭐
47
Automatically scrape data and statistics on Coronavirus to make them easily accessible in CSV format
Open Solution Googleai Object Detection
⭐
46
Open solution to the Google AI Object Detection Challenge 🍁
Targets Minimal
⭐
45
A minimal example data analysis project with the targets R package
Skippa
⭐
43
SciKIt-learn Pipeline in PAndas
Soopervisor
⭐
42
☁️ Export Ploomber pipelines to Kubernetes (Argo), Airflow, AWS Batch, SLURM, and Kubeflow.
Prodmodel
⭐
39
Build, test, deploy, iterate - Dev and prod tool for data science pipelines
Julia Workshop
⭐
39
"Integrating Julia in real-world, distributed pipelines" for JuliaCon 2017
Preprocessy
⭐
36
Python package for Customizable Data Preprocessing Pipelines
Stairs
⭐
35
Framework which helps you to make parallel/distributed calculations using data pipelines
Verified Sources
⭐
31
Contribute to dlt verified sources 🔥
Blast
⭐
31
Blast is a data orchestration tool that can run SQL and Python against Google BigQuery and Snowflake. It supports templating with Jinja, data quality tests, query validation, environment management and more.
Kubeflow Data Science On Steroids
⭐
30
The blog post about Kubeflow, including all materials
Tdastats
⭐
28
R pipeline for computing persistent homology in topological data analysis. See https://doi.org/10.21105/joss.00860 for more details.
Jenkins Ci
⭐
28
Minimal example to setup a Jenkins-CI pipeline for data science projects on OpenShift in a couple of minutes.
Odsc_india_2018
⭐
26
My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Madr_pipelines
⭐
25
Slides and materials for my talk to the Madison R Users Group
Surround
⭐
22
Surround is a framework for building AI driven microservices in Python, https://surround.readthedocs.io/en/latest/
K3ai Core
⭐
21
K3ai-core is the core library for the GO installer. Go installer will replace the current bash installer
Bodywork Ml Pipeline Project
⭐
21
Deployment template for a continuous training pipeline.
Dxc Industrialized Ai Starter
⭐
21
Industrialized AI Starter
Neon Workshop
⭐
20
A Pachyderm deep learning tutorial for conference workshops
Mlops Tdsp Template
⭐
20
Quickstart template as a fork on TDSP (https://github.com/Azure/Azure-TDSP-ProjectTempla extending the template with a suggested structure for operationalization using Azure. Includes ARM templates as IaC for resource deployment, template build and release pipelines to enable model CI/CD, template code for working with Azure ML.
Aml Days Tda Tutorial
⭐
20
Fluidml
⭐
19
FluidML is a lightweight framework for developing machine learning pipelines.
Har
⭐
18
Recognize one of six human activities such as standing, sitting, and walking using a Softmax Classifier trained on mobile phone sensor data.
Latent Semantic Analysis
⭐
18
Pipeline for training LSA models using Scikit-Learn.
Credit
⭐
18
An example project that predicts risk of credit card default using a Logistic Regression classifier and a 30,000 sample dataset.
Serpytor
⭐
18
A distributed, low-code, end-to-end data collection and analysis tool for data folks. Take the pain out of data collection from your pipeline!
Aml Run
⭐
17
GitHub Action that allows you to submit a run to your Azure Machine Learning Workspace.
Related Searches
Python Data Science (6,905)
Machine Learning Data Science (5,390)
Python Pipeline (4,255)
Jupyter Notebook Data Science (4,139)
Javascript Pipeline (1,369)
Deep Learning Data Science (1,286)
R Data Science (1,164)
Pipeline Jenkins (1,150)
Shell Pipeline (1,143)
Docker Pipeline (1,018)
1-92 of 92 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.