Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python distributed training
distributed-training
x
python
x
50 search results found
Made With Ml
⭐
36,177
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Paddle
⭐
21,659
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
Paddlenlp
⭐
10,908
👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
Skypilot
⭐
4,975
SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.
Fedml
⭐
3,946
FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, FEDML Nexus AI (https://nexus.fedml.ai) is the dedicated cloud service for generative AI
Fengshenbang Lm
⭐
3,670
Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型
Adanet
⭐
3,309
Fast and flexible AutoML with learning guarantees.
Byteps
⭐
3,254
A high performance and generic framework for distributed DNN training
Alpa
⭐
2,878
Training and serving large-scale neural networks with auto parallelization.
Hivemind
⭐
1,716
Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
Deeprec
⭐
922
DeepRec is a high-performance recommendation deep learning framework based on TensorFlow. It is hosted in incubation in LF AI & Data Foundation.
Libai
⭐
371
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Adaptdl
⭐
339
Resource-adaptive cluster scheduler for deep learning training.
Hypergbm
⭐
306
A full pipeline AutoML tool for tabular data
Handyrl
⭐
278
HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.
Torchx
⭐
275
TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and support for E2E production ML pipelines when you're ready.
Deeplearning Cfn
⭐
244
Distributed Deep Learning on AWS Using CloudFormation (CFN), MXNet and TensorFlow
Easyparallellibrary
⭐
201
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
Terngrad
⭐
152
Ternary Gradients to Reduce Communication in Distributed Deep Learning (TensorFlow)
Openks
⭐
143
OpenKS - 领域可泛化的知识学习与计算引擎
Unicom
⭐
142
universal visual model trained on LAION-400M
Plsc
⭐
129
Paddle Large Scale Classification Tools,supports ArcFace, CosFace, PartialFC, Data Parallel + Model Parallel. Model includes ResNet, ViT, Swin, DeiT, CaiT, FaceViT, MoCo, MAE, ConvMAE, CAE.
Sagemaker Xgboost Container
⭐
109
This is the Docker container based on open source framework XGBoost (https://xgboost.readthedocs.io/en/latest/) to allow customers use their own XGBoost scripts in SageMaker.
Deep Gradient Compression
⭐
106
[ICLR 2018] Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
Saturn
⭐
86
Saturn accelerates the training of large-scale deep learning models with a novel joint optimization approach.
Hetu
⭐
62
A high-performance distributed deep learning system targeting large-scale and automated distributed training.
Dynamic Training With Apache Mxnet On Aws
⭐
52
Dynamic training with Apache MXNet reduces cost and time for training deep neural networks by leveraging AWS cloud elasticity and scale. The system reduces training cost and time by dynamically updating the training cluster size during training, with minimal impact on model training accuracy.
Integrated Design Diffusion Model
⭐
50
IDDM (Industrial, landscape, animate...), support DDPM, DDIM, webui and multi-GPU distributed training. Pytorch实现,生成模型,扩散模型,分布式训练
Gradientaccumulator
⭐
47
🎯 Accumulated Gradients for TensorFlow 2
Pytorch Base Trainer
⭐
46
Pytorch分布式训练框架
Bns Gcn
⭐
43
[MLSys 2022] "BNS-GCN: Efficient Full-Graph Training of Graph Convolutional Networks with Partition-Parallelism and Random Boundary Node Sampling" by Cheng Wan, Youjie Li, Ang Li, Nam Sung Kim, Yingyan Lin
Ftpipe
⭐
37
FTPipe and related pipeline model parallelism research.
Redco
⭐
35
MLSys Workshop NeurIPS 2023 - Redco: A Lightweight Tool to Automate Distributed Training and Inference
Note
⭐
31
Easily implement parallel training and distributed training. Machine learning library.
Pytorch Model Parallel
⭐
29
A memory balanced and communication efficient FullyConnected layer with CrossEntropyLoss model parallel implementation in PyTorch
Basecls
⭐
27
A codebase & model zoo for pretrained backbone based on MegEngine.
Fast Kubeflow
⭐
25
This repo covers Kubeflow Environment with LABs: Kubeflow GUI, Jupyter Notebooks on pods, Kubeflow Pipelines, Experiments, KALE, KATIB (AutoML: Hyperparameter Tuning), KFServe (Model Serving), Training Operators (Distributed Training), Projects, etc.
Realtime Semantic Segmentation Pytorch
⭐
22
PyTorch implementation of over 30 realtime semantic segmentations models, e.g. BiSeNetv1, BiSeNetv2, CGNet, ContextNet, DABNet, DDRNet, EDANet, ENet, ERFNet, ESPNet, ESPNetv2, FastSCNN, ICNet, LEDNet, LinkNet, PP-LiteSeg, SegNet, ShelfNet, STDC, SwiftNet, and support knowledge distillation, distributed training etc.
Distributed Pytorch
⭐
22
Distributed, mixed-precision training with PyTorch
Yolo3d Yolov4 Pytorch
⭐
21
YOLO3D: End-to-end real-time 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud (ECCV 2018)
Pytorch Distributed Nlp
⭐
20
pytorch分布式训练
Shockwave
⭐
14
Code for "Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning" [NSDI '23]
Jax Models
⭐
10
Explore implementations of deep learning concepts like Transformers, Attention, Llama, GPT, InstructGPT, RLHF, Gaussian Processes, Bayesian Inference, Newton Raphson, Distributed Trainers and more!
Pytorch Multi Gpu Training Tutorial
⭐
10
A Pytorch Tutorial To Class Incremental Learning
⭐
10
a PyTorch Tutorial to Class-Incremental Learning | a Distributed Training Template of CIL with core code less than 100 lines.
Distributed Training In Tensorflow 2 With Ai Platform
⭐
9
Contains code to demonstrate distributed training in TensorFlow 2 with AI Platform and custom Docker contains.
Deepcell Keras
⭐
7
Reimplement Deep Cell with Keras and Horovod.
Pytorch_yolov3
⭐
5
A PyTorch Implementation of YOLOv3
Ai_platform
⭐
5
Django Bootstrap SQLite
Pytorch Transformer Distributed
⭐
5
Distributed training (multi-node) of a Transformer model
Related Searches
Python Django (28,897)
Python Machine Learning (20,195)
Python Deep Learning (17,866)
Python Dataset (14,792)
Python Docker (14,113)
Python Tensorflow (13,736)
Python Jupyter Notebook (12,976)
Python Video Game (10,124)
Python Testing (9,394)
Python Natural Language Processing (9,064)
1-50 of 50 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.