Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for dataset llm
dataset
x
llm
x
24 search results found
Deeplake
⭐
7,731
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
Awesome Pretrained Chinese Nlp Models
⭐
3,738
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
Awesome Domain Llm
⭐
1,502
收集和梳理垂直领域的开源模型、数据集及评测基准。
Llmdatahub
⭐
1,048
A quick guide (especially) for trending instruction finetuning datasets
Safe Rlhf
⭐
1,040
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Data Juicer
⭐
994
A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据!
Db Gpt Hub
⭐
759
A repository that contains models, datasets, and fine-tuning techniques for DB-GPT, with the purpose of enhancing model performance in Text-to-SQL
Prompt4reasoningpapers
⭐
717
Repository for the ACL2023 paper "Reasoning with Language Model Prompting: A Survey".
Awesome Code Llm
⭐
420
A curated list of language modeling researches for code and related datasets.
Textbook_quality
⭐
274
Generate textbook-quality LLM pretraining data
Awesome Llm Eval
⭐
183
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, learderboard, papers, docs and models, mainly for Evaluation on LLMs.
Starwhale
⭐
178
an MLOps/LLMOps platform
Trustllm
⭐
164
TrustLLM: Trustworthiness in Large Language Models
Csghub
⭐
157
CSGHub is an opensource large model assets platform just like on-premise huggingface which helps to manage datasets, model files, codes and more. CSGHub是一个开源、可信的大模型资产管理平台,可帮助用户治理LLM和LLM应用生命周 Glance管理虚拟机镜像、Harbor管理容器镜像以及Sonatype Nexus管理制品的方式,实现对LLM资产的管理。欢迎关注反馈和Star⭐️
Uhgeval
⭐
140
Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation
Awesome Llm Human Preference Datasets
⭐
116
A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.
Awesome_multimodel_llm
⭐
89
Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.
Fastlorachat
⭐
83
Instruct-tune LLaMA on consumer hardware with shareGPT data
Monitors4codegen
⭐
60
Code and Data artifact for NeurIPS 2023 paper - "Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context". `multispy` is a lsp client library in Python intended to be used to build applications around language servers.
Scigraphqa
⭐
58
SciGraphQA
Awesome Chinese Llm
⭐
45
Awesome Chinese LLM: A curated list of Chinese Large Language Model 中文大语言模型数据集和模型资料汇总
Awesome Instruction Datasets
⭐
41
A collection of awesome-prompt-datasets, awesome-instruction-dataset, to train ChatLLM such as chatgpt 收录各种各样的指令数据集, 用于训练 ChatLLM 模型。
Arb
⭐
35
Advanced Reasoning Benchmark Dataset for LLMs
Grimoire
⭐
35
Grimoire is All You Need for Enhancing Large Language Models
M3dbench
⭐
23
M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts. Furthermore, M3DBench provides a new benchmark to assess large models across 3D vision-centric tasks.
Arcadia
⭐
19
A diverse, simple, and secure one-stop LLMOps platform
Huggingface Datasets Text Quality Analysis
⭐
19
Retrieves parquet files from Hugging Face, identifies and quantifies junky data, duplication, contamination, and biased content in dataset using pandas
Csghub Server
⭐
18
CSGHub Server is the backend server for CSGHub which helps user to manage datasets, model files, codes and more. CSGHub Server是开源大模型资产管理平台CSGHub的服务端部分的开源项目,提供基于REST API的模型和数据集等大模型资产管理功能。欢迎关注反馈和Star⭐️
Homoscriptor
⭐
17
Fuel innovation and advance language models with HomoScriptor: A vibrant, community-driven dataset for fine-tuning large language models.
Manifesto
⭐
17
Page de préconfiguration de la communauté OpenLLM-France
Lm Datasets
⭐
16
A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.
Tree Of Knowledge
⭐
15
ToK aka Tree of Knowledge for Large Language Models LLM. It's a novel dataset that inspires knowledge symbolic correlation in simple input and output prompts
Vllm Safety Benchmark
⭐
15
Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"
C Vqa
⭐
14
Counterfactual Reasoning VQA Dataset
Mms_benchmark
⭐
14
The most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected from over 350 datasets reported in the scientific literature based on strict quality criteria and covers 27 languages.
Llmdatadistill
⭐
12
distill large scale web page text
Beavertails
⭐
12
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
Llm Theory Of Mind
⭐
11
Testing Theory of Mind (ToM) in language models with epistemic logic
Open Llm Datasets
⭐
9
Repository for organizing datasets and papers used in Open LLM.
Language Model Recommendation
⭐
8
Resources accompanying the "Zero-Shot Recommendation as Language Modeling" paper (ECIR2022)
Olah
⭐
6
Self-hosted huggingface mirror service.
Battle Of The Wordsmiths
⭐
5
Official github repository: Battle of the Wordsmiths: Comparing ChatGPT, GPT-4, Claude, and Bard (dataset)
Related Searches
Python Dataset (15,103)
Jupyter Notebook Dataset (6,824)
Deep Learning Dataset (2,477)
Machine Learning Dataset (2,279)
Dataset Pytorch (1,847)
Dataset Tensorflow (1,583)
Dataset Classification (1,500)
Python Llm (1,377)
Dataset Convolutional Neural Networks (1,264)
Dataset Paper (1,252)
1-24 of 24 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.