Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for pipeline data science
data-science
x
pipeline
x
85 search results found
Prefect
⭐
14,603
Prefect is a workflow orchestration tool empowering developers to build, observe, and react to data pipelines
Dagster
⭐
9,467
An orchestration platform for the development, production, and observation of data assets.
Great_expectations
⭐
9,179
Always know what to expect from your data.
Mage Ai
⭐
6,324
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Pachyderm
⭐
6,035
Data-Centric Pipelines and Data Versioning
Taipy
⭐
4,311
Turns Data and AI algorithms into production-ready web applications in no time.
Orchest
⭐
3,876
Build data pipelines, the easy way 🛠️
Zenml
⭐
3,841
ZenML 🙏: Build portable, production-ready MLOps pipelines. https://zenml.io.
Datascienceresources
⭐
3,826
Open Source Data Science Resources.
Polyaxon
⭐
3,438
MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle
Pipelines
⭐
3,368
Machine Learning Pipelines for Kubeflow
Ploomber
⭐
3,318
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
Pyfunctional
⭐
2,341
Python library for creating data pipelines with chain functional programming
Mlj.jl
⭐
1,690
A Julia machine learning framework
Mlbox
⭐
1,403
MLBox is a powerful Automated Machine Learning python library.
Drake
⭐
1,329
An R-focused pipeline toolkit for reproducibility and high-performance computing
Mlrun
⭐
1,177
Machine Learning automation and tracking
Sematic
⭐
913
An open-source ML pipeline development platform
Targets
⭐
887
Function-oriented Make-like declarative workflows for R
Dataflowjavasdk
⭐
853
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Lightautoml
⭐
769
LAMA - automatic model creation framework
Pdpipe
⭐
710
Easy pipelines for pandas DataFrames.
Evalml
⭐
679
EvalML is an AutoML library written in python.
Covalent
⭐
608
Pythonic tool for running machine-learning/high performance/quantum-computing workflows in heterogeneous environments.
Baikal
⭐
573
A graph-based functional API for building complex scikit-learn pipelines.
Versatile Data Kit
⭐
389
One framework to develop, deploy and operate data workflows with Python and SQL.
Open Solution Mapping Challenge
⭐
363
Open solution to the Mapping Challenge 🌎
Bodywork Core
⭐
358
ML pipeline orchestration and model deployments on Kubernetes, made really easy.
Automlpipeline.jl
⭐
331
A package that makes it trivial to create and evaluate machine learning pipeline architectures.
Butterfree
⭐
269
A tool for building feature stores.
Naas
⭐
266
Low-code Python library to safely use notebooks in production: schedule workflows, generate assets, trigger webhooks, send notifications, build pipelines, manage secrets (Cloud-only)
Nimbusml
⭐
265
Python machine learning package providing simple interoperability between ML.NET and scikit-learn components.
Chain.jl
⭐
262
A Julia package for piping a value through a series of transformation expressions using a more convenient syntax than Julia's native piping functionality.
Feature Selection For Machine Learning
⭐
248
Code repository for the online course Feature Selection for Machine Learning
Pipelinex
⭐
212
PipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Codeflare
⭐
199
Simplifying the definition and execution, scaling and deployment of pipelines on the cloud.
Batchflow
⭐
198
BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.
Toloka Kit
⭐
195
Toloka-Kit is a Python library for working with Toloka API.
Buildflow
⭐
188
BuildFlow, is an open source framework for building large scale systems using Python. All you need to do is describe where your input is coming from and where your output should be written, and BuildFlow handles the rest. No configuration outside of the code is required.
Setl
⭐
177
A simple Spark-powered ETL framework that just works 🍺
Pureml
⭐
174
Developer platform for production ML.
Dataplane
⭐
171
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.
Open Solution Toxic Comments
⭐
151
Open solution to the Toxic Comment Classification Challenge
Atom
⭐
137
Automated Tool for Optimized Modelling
Steppy
⭐
134
Lightweight, Python library for fast and reproducible experimentation 🔬
Incubator Liminal
⭐
131
Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.
Mlr3pipelines
⭐
127
Dataflow Programming for Machine Learning in R
Open Solution Salt Identification
⭐
120
Open solution to the TGS Salt Identification Challenge
Data_algebra
⭐
112
Codd method-chained SQL generator and Pandas data processing in Python.
Tarchetypes
⭐
108
Archetypes for targets and pipelines
Pipeline
⭐
101
Pipeline is an open source python SDK for building AI/ML workflows
Patterns Devkit
⭐
101
Data pipelines from re-usable components
Tabletransforms.jl
⭐
98
Transforms and pipelines with tabular data in Julia
Dataflow Ops
⭐
97
Project demonstrating how to automate Prefect 2.0 deployments to AWS ECS Fargate
Targets Tutorial
⭐
90
Short course on the targets R package
Wildebeest
⭐
84
File processing pipelines
Ni
⭐
81
Say "ni" to data of any size
Prism
⭐
70
Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python.
Great_expectations_action
⭐
68
A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.
Pipeliner
⭐
61
Machine learning pipelines for R.
Timeserio
⭐
61
Better `keras` models for time series and beyond
Oreilly Ai K8s Tutorial
⭐
58
Materials for the "AI on Kubernetes" tutorial at O'Reilly AI SF 2018
Drake Examples
⭐
58
Example workflows for the drake R package
Drake
⭐
57
The user manual for the drake R package
Drivers
⭐
53
Low-code Python library enabling access to APIs, tools, data sources in seconds.
Xp
⭐
53
A framework (comand line tool + libraries) for creating flexible compute pipelines
Prefect Deployment Patterns
⭐
48
Code examples showing flow deployment to various types of infrastructure
Coronavirus Stats
⭐
47
Automatically scrape data and statistics on Coronavirus to make them easily accessible in CSV format
Open Solution Googleai Object Detection
⭐
46
Open solution to the Google AI Object Detection Challenge 🍁
Targets Minimal
⭐
45
A minimal example data analysis project with the targets R package
Skippa
⭐
43
SciKIt-learn Pipeline in PAndas
Soopervisor
⭐
42
☁️ Export Ploomber pipelines to Kubernetes (Argo), Airflow, AWS Batch, SLURM, and Kubeflow.
Julia Workshop
⭐
39
"Integrating Julia in real-world, distributed pipelines" for JuliaCon 2017
Prodmodel
⭐
39
Build, test, deploy, iterate - Dev and prod tool for data science pipelines
Preprocessy
⭐
36
Python package for Customizable Data Preprocessing Pipelines
Stairs
⭐
35
Framework which helps you to make parallel/distributed calculations using data pipelines
Blast
⭐
31
Blast is a data orchestration tool that can run SQL and Python against Google BigQuery and Snowflake. It supports templating with Jinja, data quality tests, query validation, environment management and more.
Verified Sources
⭐
31
Contribute to dlt verified sources 🔥
Kubeflow Data Science On Steroids
⭐
30
The blog post about Kubeflow, including all materials
Jenkins Ci
⭐
28
Minimal example to setup a Jenkins-CI pipeline for data science projects on OpenShift in a couple of minutes.
Tdastats
⭐
28
R pipeline for computing persistent homology in topological data analysis. See https://doi.org/10.21105/joss.00860 for more details.
Odsc_india_2018
⭐
26
My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Madr_pipelines
⭐
25
Slides and materials for my talk to the Madison R Users Group
Surround
⭐
22
Surround is a framework for building AI driven microservices in Python, https://surround.readthedocs.io/en/latest/
Bodywork Ml Pipeline Project
⭐
21
Deployment template for a continuous training pipeline.
Dxc Industrialized Ai Starter
⭐
21
Industrialized AI Starter
Mlops Tdsp Template
⭐
20
Quickstart template as a fork on TDSP (https://github.com/Azure/Azure-TDSP-ProjectTempla extending the template with a suggested structure for operationalization using Azure. Includes ARM templates as IaC for resource deployment, template build and release pipelines to enable model CI/CD, template code for working with Azure ML.
Neon Workshop
⭐
20
A Pachyderm deep learning tutorial for conference workshops
Aml Days Tda Tutorial
⭐
20
Fluidml
⭐
19
FluidML is a lightweight framework for developing machine learning pipelines.
Credit
⭐
18
An example project that predicts risk of credit card default using a Logistic Regression classifier and a 30,000 sample dataset.
Serpytor
⭐
18
A distributed, low-code, end-to-end data collection and analysis tool for data folks. Take the pain out of data collection from your pipeline!
Har
⭐
18
Recognize one of six human activities such as standing, sitting, and walking using a Softmax Classifier trained on mobile phone sensor data.
Latent Semantic Analysis
⭐
18
Pipeline for training LSA models using Scikit-Learn.
Aml Run
⭐
17
GitHub Action that allows you to submit a run to your Azure Machine Learning Workspace.
Sparklanes
⭐
16
A lightweight data processing framework for Apache Spark
Composable Logs
⭐
16
Python library to run ML/data pipelines on stateless compute infrastructure (that may be ephemeral or serverless). Please see the documentation site with more details and demo:
Targets
⭐
16
User manual of the targets R pacakge
Astarte_flow
⭐
14
Build data processing pipelines with Astarte Flow.
Viper
⭐
14
Simple, expressive pipeline syntax to transform and manipulate data with ease
Related Searches
Python Data Science (6,905)
Machine Learning Data Science (5,390)
Python Pipeline (4,255)
Jupyter Notebook Data Science (4,139)
Javascript Pipeline (1,369)
Deep Learning Data Science (1,286)
R Data Science (1,164)
Pipeline Jenkins (1,150)
Shell Pipeline (1,143)
Docker Pipeline (1,018)
1-85 of 85 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.