Awesome Open Source

Programming Languages

Search results for python distributed training

distributed-training x

50 search results found

Made With Ml ⭐ 36,177

Learn how to design, develop, deploy and iterate on production-grade ML applications.

Paddle ⭐ 21,659

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

Paddlenlp ⭐ 10,908

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.

Skypilot ⭐ 4,975

SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.

Fedml ⭐ 3,946

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, FEDML Nexus AI (https://nexus.fedml.ai) is the dedicated cloud service for generative AI

Fengshenbang Lm ⭐ 3,670

Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型

Adanet ⭐ 3,309

Fast and flexible AutoML with learning guarantees.

Byteps ⭐ 3,254

A high performance and generic framework for distributed DNN training

Training and serving large-scale neural networks with auto parallelization.

Hivemind ⭐ 1,716

Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.

Deeprec ⭐ 922

DeepRec is a high-performance recommendation deep learning framework based on TensorFlow. It is hosted in incubation in LF AI & Data Foundation.

LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training

Adaptdl ⭐ 339

Resource-adaptive cluster scheduler for deep learning training.

Hypergbm ⭐ 306

A full pipeline AutoML tool for tabular data

Handyrl ⭐ 278

HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.

TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and support for E2E production ML pipelines when you're ready.

Deeplearning Cfn ⭐ 244

Distributed Deep Learning on AWS Using CloudFormation (CFN), MXNet and TensorFlow

Easyparallellibrary ⭐ 201

Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.

Terngrad ⭐ 152

Ternary Gradients to Reduce Communication in Distributed Deep Learning (TensorFlow)

OpenKS - 领域可泛化的知识学习与计算引擎

universal visual model trained on LAION-400M

Paddle Large Scale Classification Tools，supports ArcFace, CosFace, PartialFC, Data Parallel + Model Parallel. Model includes ResNet, ViT, Swin, DeiT, CaiT, FaceViT, MoCo, MAE, ConvMAE, CAE.

Sagemaker Xgboost Container ⭐ 109

This is the Docker container based on open source framework XGBoost (https://xgboost.readthedocs.io/en/latest/) to allow customers use their own XGBoost scripts in SageMaker.

Deep Gradient Compression ⭐ 106

[ICLR 2018] Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

Saturn accelerates the training of large-scale deep learning models with a novel joint optimization approach.

A high-performance distributed deep learning system targeting large-scale and automated distributed training.

Dynamic Training With Apache Mxnet On Aws ⭐ 52

Dynamic training with Apache MXNet reduces cost and time for training deep neural networks by leveraging AWS cloud elasticity and scale. The system reduces training cost and time by dynamically updating the training cluster size during training, with minimal impact on model training accuracy.

Integrated Design Diffusion Model ⭐ 50

IDDM (Industrial, landscape, animate...), support DDPM, DDIM, webui and multi-GPU distributed training. Pytorch实现，生成模型，扩散模型，分布式训练

Gradientaccumulator ⭐ 47

🎯 Accumulated Gradients for TensorFlow 2

Pytorch Base Trainer ⭐ 46

Pytorch分布式训练框架

[MLSys 2022] "BNS-GCN: Efficient Full-Graph Training of Graph Convolutional Networks with Partition-Parallelism and Random Boundary Node Sampling" by Cheng Wan, Youjie Li, Ang Li, Nam Sung Kim, Yingyan Lin

FTPipe and related pipeline model parallelism research.

MLSys Workshop NeurIPS 2023 - Redco: A Lightweight Tool to Automate Distributed Training and Inference

Easily implement parallel training and distributed training. Machine learning library.

Pytorch Model Parallel ⭐ 29

A memory balanced and communication efficient FullyConnected layer with CrossEntropyLoss model parallel implementation in PyTorch

A codebase & model zoo for pretrained backbone based on MegEngine.

Fast Kubeflow ⭐ 25

This repo covers Kubeflow Environment with LABs: Kubeflow GUI, Jupyter Notebooks on pods, Kubeflow Pipelines, Experiments, KALE, KATIB (AutoML: Hyperparameter Tuning), KFServe (Model Serving), Training Operators (Distributed Training), Projects, etc.

Realtime Semantic Segmentation Pytorch ⭐ 22

PyTorch implementation of over 30 realtime semantic segmentations models, e.g. BiSeNetv1, BiSeNetv2, CGNet, ContextNet, DABNet, DDRNet, EDANet, ENet, ERFNet, ESPNet, ESPNetv2, FastSCNN, ICNet, LEDNet, LinkNet, PP-LiteSeg, SegNet, ShelfNet, STDC, SwiftNet, and support knowledge distillation, distributed training etc.

Distributed Pytorch ⭐ 22

Distributed, mixed-precision training with PyTorch

Yolo3d Yolov4 Pytorch ⭐ 21

YOLO3D: End-to-end real-time 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud (ECCV 2018)

Pytorch Distributed Nlp ⭐ 20

pytorch分布式训练

Shockwave ⭐ 14

Code for "Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning" [NSDI '23]

Jax Models ⭐ 10

Explore implementations of deep learning concepts like Transformers, Attention, Llama, GPT, InstructGPT, RLHF, Gaussian Processes, Bayesian Inference, Newton Raphson, Distributed Trainers and more!

Pytorch Multi Gpu Training Tutorial ⭐ 10

A Pytorch Tutorial To Class Incremental Learning ⭐ 10

a PyTorch Tutorial to Class-Incremental Learning | a Distributed Training Template of CIL with core code less than 100 lines.

Distributed Training In Tensorflow 2 With Ai Platform ⭐ 9

Contains code to demonstrate distributed training in TensorFlow 2 with AI Platform and custom Docker contains.

Deepcell Keras ⭐ 7

Reimplement Deep Cell with Keras and Horovod.

Pytorch_yolov3 ⭐ 5

A PyTorch Implementation of YOLOv3

Ai_platform ⭐ 5

Django Bootstrap SQLite

Pytorch Transformer Distributed ⭐ 5

Distributed training (multi-node) of a Transformer model

Related Searches

Python Django (28,897)

Python Machine Learning (20,195)

Python Deep Learning (17,866)

Python Dataset (14,792)

Python Docker (14,113)

Python Tensorflow (13,736)

Python Jupyter Notebook (12,976)

Python Video Game (10,124)

Python Testing (9,394)

Python Natural Language Processing (9,064)

1-50 of 50 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.