Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for rlhf
rlhf
x
59 search results found
Open Assistant
⭐
36,197
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
Llama Factory
⭐
10,715
Easy-to-use LLM fine-tuning framework (LLaMA, BLOOM, Mistral, Baichuan, Qwen, ChatGLM)
Llmsurvey
⭐
7,255
The official GitHub page for the survey paper "A Survey of Large Language Models".
Chinese Llama Alpaca 2
⭐
5,810
中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
Internlm
⭐
4,412
Official release of InternLM2 7B and 20B base and chat models. 200K context support
Chatglm Efficient Tuning
⭐
3,130
Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调
Argilla
⭐
3,097
Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.
Alignment Handbook
⭐
3,024
Robust recipes for to align language models with human and AI preferences
Docta
⭐
2,472
A Doctor for your data
Awesome Rlhf
⭐
2,376
A curated list of reinforcement learning with human feedback resources (continually updated)
Webglm
⭐
1,198
WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)
Safe Rlhf
⭐
1,040
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Alpaca_eval
⭐
899
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Openrlhf
⭐
704
A Ray-based High-performance RLHF framework (Support 70B+ full tuning & LoRA & Mixtral)
Imagereward
⭐
674
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
Xtreme1
⭐
567
Xtreme1 - The Next GEN Platform for Multimodal Training Data. #3D annotation, 3D segmentation, lidar-camera fusion annotation, image annotation and RLHF tools are supported!
Textrl
⭐
519
Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
Distilabel
⭐
458
⚗️ AI Feedback framework for scalable LLM alignment
Alignllmhumansurvey
⭐
368
Aligning Large Language Models with Human: A Survey
Halos
⭐
339
A library with extensible implementations of DPO, KTO, PPO, and other human-centered loss functions (HALOs).
Pykoi
⭐
332
pykoi: Active learning in one unified interface
Medqa Chatglm
⭐
235
🛰️ 基于真实医疗对话数据在ChatGLM上进行LoRA、P-Tuning V2、Freeze、RLHF等微调,我们的眼光不止于医疗问答
Llm Rlhf Tuning
⭐
225
LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA)
Step_into_llm
⭐
211
MindSpore online courses: Step into LLM
Cornucopia Llama Fin Chinese
⭐
178
聚宝盆(Cornucopia): 基于中文金融知识的LLaMA微调模型;涉及SFT、RLHF、GPU训练部署等
Chain Of Hindsight
⭐
171
Chain-of-Hindsight, a simpler and more effective alternative to RLHF
Awesome Llm Human Preference Datasets
⭐
116
A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.
Preferencetransformer
⭐
107
Preference Transformer: Modeling Human Preferences using Transformers for RL (ICLR2023 Accepted)
Instructgoose
⭐
100
Implementation of Reinforcement Learning from Human Feedback (RLHF)
Pretraining With Human Feedback
⭐
97
Code accompanying the paper Pretraining Language Models with Human Preferences
Chatglm Rlhf
⭐
68
对ChatGLM直接使用RLHF提升或降低目标输出概率|Modify ChatGLM output with only RLHF
Open Chatgpt
⭐
66
The open source implementation of ChatGPT, Alpaca, Vicuna and RLHF Pipeline. 从0开始实现一个ChatGPT.
Remax
⭐
61
Cogment Verse
⭐
60
Research platform for Human-in-the-loop learning (HILL) & Multi-Agent Reinforcement Learning (MARL)
Log10
⭐
53
Python client library for managing your LLM data in one place
Opening Up Chatgpt.github.io
⭐
52
Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In Proceedings of the 5th International Conference on Conversational User Interfaces. doi:10.1145/3571884.3604316.
Alpaca Rlhf
⭐
42
Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
Llama Trl
⭐
38
LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA
Okapi
⭐
36
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
Awesome Rlaif
⭐
35
A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)
My Alpaca
⭐
30
Reproduce alpaca
Chatglm_rlhf
⭐
28
chatglm_rlhf_finetuning
Llm_rlhf
⭐
24
realize the reinforcement learning training for gpt2 llama bloom and so on llm model
Rewardedsoups
⭐
23
Rewarded soups official implementation
Chatglm Lora Rlhf Pytorch
⭐
21
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM
Vicuna Lora Rlhf Pytorch
⭐
17
A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna
Awesome Llm
⭐
15
Curated list of open source and openly accessible large language models
Lm Research Hub
⭐
14
Language Modeling Research Hub, a comprehensive compendium for enthusiasts and scholars delving into the fascinating realm of language models (LMs), with a particular focus on large language models (LLMs)
Zero Shot Reward Models
⭐
14
ZYN: Zero-Shot Reward Models with Yes-No Questions
T2i Humanfeedback
⭐
14
Implementations of Baseline Methods for Aligning Text2Img Diffusion Models with Human FeedBack
Prompt Oirl
⭐
14
code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning
Beavertails
⭐
12
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
Jax Models
⭐
10
Explore implementations of deep learning concepts like Transformers, Attention, Llama, GPT, InstructGPT, RLHF, Gaussian Processes, Bayesian Inference, Newton Raphson, Distributed Trainers and more!
Alpaca Lora Rlhf Pytorch
⭐
10
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
Create Your Own Chatgpt
⭐
7
Create your own ChatGPT with Python
Dppo
⭐
7
Official implementation of "Direct Preference-based Policy Optimization without Reward Modeling" (NeurIPS 2023)
Pathologist In The Loop
⭐
6
[ NeurIPS 2023 ] Official Codebase for "Aligning Synthetic Medical Images with Clinical Knowledge using Human Feedback"
Constitutional Ai Awesome Papers
⭐
6
Paper lists about 'Constitutional AI System' or 'AI under Ethical Guidelines'
Awesome Rlaif
⭐
5
A curated and updated list of relevant articles and repositories on Reinforcement Learning from AI Feedback (RLAIF)
Related Searches
Python Rlhf (34)
Llm Rlhf (28)
Llama Rlhf (18)
Large Language Models Rlhf (18)
Language Model Rlhf (14)
Reinforcement Learning Rlhf (13)
Natural Language Processing Rlhf (12)
Llms Rlhf (9)
Artificial Intelligence Rlhf (7)
Fine Tuning Rlhf (7)
1-59 of 59 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.