Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for machine learning distributed training
distributed-training
x
machine-learning
x
22 search results found
Made With Ml
⭐
35,496
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Paddle
⭐
21,527
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
Skypilot
⭐
4,975
SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.
Fedml
⭐
3,946
FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, FEDML Nexus AI (https://nexus.fedml.ai) is the dedicated cloud service for generative AI
Adanet
⭐
3,309
Fast and flexible AutoML with learning guarantees.
Byteps
⭐
3,254
A high performance and generic framework for distributed DNN training
Alpa
⭐
2,878
Training and serving large-scale neural networks with auto parallelization.
Determined
⭐
2,715
Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.
Hivemind
⭐
1,716
Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
Deeprec
⭐
922
DeepRec is a high-performance recommendation deep learning framework based on TensorFlow. It is hosted in incubation in LF AI & Data Foundation.
Efficient Dl Systems
⭐
502
Efficient Deep Learning Systems course materials (HSE, YSDA)
Adaptdl
⭐
339
Resource-adaptive cluster scheduler for deep learning training.
Handyrl
⭐
278
HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.
Torchx
⭐
275
TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and support for E2E production ML pipelines when you're ready.
Sagemaker Xgboost Container
⭐
109
This is the Docker container based on open source framework XGBoost (https://xgboost.readthedocs.io/en/latest/) to allow customers use their own XGBoost scripts in SageMaker.
Hetu
⭐
62
A high-performance distributed deep learning system targeting large-scale and automated distributed training.
Dynamic Training With Apache Mxnet On Aws
⭐
52
Dynamic training with Apache MXNet reduces cost and time for training deep neural networks by leveraging AWS cloud elasticity and scale. The system reduces training cost and time by dynamically updating the training cluster size during training, with minimal impact on model training accuracy.
Note
⭐
31
Easily implement parallel training and distributed training. Machine learning library.
Shockwave
⭐
14
Code for "Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning" [NSDI '23]
Jax Models
⭐
10
Explore implementations of deep learning concepts like Transformers, Attention, Llama, GPT, InstructGPT, RLHF, Gaussian Processes, Bayesian Inference, Newton Raphson, Distributed Trainers and more!
Pytorch Transformer Distributed
⭐
5
Distributed training (multi-node) of a Transformer model
Ai_platform
⭐
5
Django Bootstrap SQLite
Related Searches
Python Machine Learning (14,099)
Jupyter Notebook Machine Learning (12,247)
Machine Learning Neural Network (4,397)
Machine Learning Tensorflow (4,050)
Machine Learning Natural Language Processing (3,891)
Machine Learning Artificial Intelligence (3,877)
Machine Learning Data Science (3,802)
Machine Learning Pytorch (2,910)
Machine Learning Dataset (2,298)
Machine Learning Computer Vision (1,966)
1-22 of 22 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.