Awesome Open Source

Programming Languages

Search results for llm evaluation

llm-evaluation x

12 search results found

Promptfoo ⭐ 1,785

Test your prompts, models, RAGs. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality. LLM evals for OpenAI/Azure GPT, Anthropic Claude, VertexAI Gemini, Ollama, Local & private models like Mistral/Mixtral/Llama with CI/CD

Deepeval ⭐ 1,070

The Evaluation Framework for LLMs

The all-in-one LLMOps platform: prompt management, evaluation, human feedback, and deployment all in one place.

Continuous Eval ⭐ 78

Evaluation for LLM / RAG pipelines, ready for CI/CD

Commongen Eval ⭐ 74

Evaluating LLMs with CommonGen-Lite

Awesome Llm In Social Science ⭐ 63

Awesome papers involving LLMs in Social Science.

Athina Evals ⭐ 45

Python SDK for running evaluations on LLM generated responses

Just Eval ⭐ 36

A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.

The implementation for EMNLP 2023 paper ”Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators“

Dcr Consistency ⭐ 16

DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models

Leaf Playground ⭐ 14

A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.

Parea Sdk Py ⭐ 13

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

Related Searches

Python Llm Evaluation (4)

Large Language Models Llm Evaluation (3)

1-12 of 12 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.