Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for llm evaluation
llm-evaluation
x
12 search results found
Promptfoo
⭐
1,785
Test your prompts, models, RAGs. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality. LLM evals for OpenAI/Azure GPT, Anthropic Claude, VertexAI Gemini, Ollama, Local & private models like Mistral/Mixtral/Llama with CI/CD
Deepeval
⭐
1,070
The Evaluation Framework for LLMs
Agenta
⭐
651
The all-in-one LLMOps platform: prompt management, evaluation, human feedback, and deployment all in one place.
Continuous Eval
⭐
78
Evaluation for LLM / RAG pipelines, ready for CI/CD
Commongen Eval
⭐
74
Evaluating LLMs with CommonGen-Lite
Awesome Llm In Social Science
⭐
63
Awesome papers involving LLMs in Social Science.
Athina Evals
⭐
45
Python SDK for running evaluations on LLM generated responses
Just Eval
⭐
36
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
Conner
⭐
24
The implementation for EMNLP 2023 paper ”Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators“
Dcr Consistency
⭐
16
DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models
Leaf Playground
⭐
14
A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.
Parea Sdk Py
⭐
13
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Related Searches
Python Llm Evaluation (4)
Large Language Models Llm Evaluation (3)
1-12 of 12 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.