Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for evaluation framework
evaluation-framework
x
50 search results found
Lm Evaluation Harness
⭐
3,768
A framework for few-shot evaluation of language models.
Promptfoo
⭐
1,785
Test your prompts, models, RAGs. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality. LLM evals for OpenAI/Azure GPT, Anthropic Claude, VertexAI Gemini, Ollama, Local & private models like Mistral/Mixtral/Llama with CI/CD
Deepeval
⭐
1,070
The Evaluation Framework for LLMs
Recsys2019_deeplearning_evaluation
⭐
871
This is the repository of our article published in RecSys 2019 "Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches" and of several follow-up studies.
Pydgn
⭐
211
A research library for automating experiments on Deep Graph Networks
Zeno
⭐
202
AI Data Management & Evaluation Platform
Expressive
⭐
146
Expressive is a cross-platform expression parsing and evaluation framework. The cross-platform nature is achieved through compiling for .NET Standard so it will run on practically any platform.
Tonic_validate
⭐
128
Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.
Pysodevaltoolkit
⭐
112
PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection
Crowdflow
⭐
92
Optical Flow Dataset and Benchmark for Visual Crowd Analysis
Continuous Eval
⭐
78
Evaluation for LLM / RAG pipelines, ready for CI/CD
Lm Evaluation
⭐
69
Evaluation suite for large-scale language models.
Sordi Ai Evaluation Gui
⭐
68
This repository allows you to evaluate a trained computer vision model and get general information and evaluation metrics with little configuration.
Rankeval
⭐
65
Official repository of RankEval: An Evaluation and Analysis Framework for Learning-to-Rank Solutions.
Birl
⭐
63
BIRL: Benchmark on Image Registration methods with Landmark validations
Dialogentailment
⭐
62
The implementation of the paper "Evaluating Coherence in Dialogue Systems using Entailment"
Vectory
⭐
56
Vectory provides a collection of tools to track and compare embedding versions.
Athina Evals
⭐
45
Python SDK for running evaluations on LLM generated responses
Od Test
⭐
44
OD-test: A Less Biased Evaluation of Out-of-Distribution (Outlier) Detectors (PyTorch)
Evalify
⭐
41
Evaluate your biometric verification models literally in seconds.
Sim4rec
⭐
37
Simulator for training and evaluation of Recommender Systems
Kolena
⭐
37
Python client for Kolena's machine learning testing platform
Pactus
⭐
37
Framework to evaluate Trajectory Classification Algorithms
Codefuse Evaluation
⭐
37
Industrial-level evaluation benchmarks for Coding LLMs in the full life-cycle of AI native software developing.企业级代码大模型评测体系,持续开放中
Efda
⭐
28
Evaluation Framework for Dependency Analysis (EFDA)
Irspack
⭐
27
Train, evaluate, and optimize implicit feedback-based recommender systems.
Repsys
⭐
26
Framework for Interactive Evaluation of Recommender Systems
Gval
⭐
18
A high-level Python framework to evaluate the skill of geospatial datasets by comparing candidates to benchmark maps producing agreement maps and metrics.
Fast_prototype
⭐
15
This is a machine learning framework that enables developers to iterate fast over different ML architecture designs.
Corl
⭐
15
The Core Reinforcement Learning library is intended to enable scalable deep reinforcement learning experimentation in a manner extensible to new simulations and new ways for the learning agents to interact with them. The hope is that this makes RL research easier by removing lock-in to particular simulations.The work is released under the follow APRS approval. Initial release of CoRL - Part #1 -Approved on 2022-05-2024 12:08:51 - PA Approval # [AFRL-2022-2455]" Documentation https://act3-ace.g
Quica
⭐
15
quica is a tool to run inter coder agreement pipelines in an easy and effective ways. Multiple measures are run and results are collected in a single table than can be easily exported in Latex
Moonshot
⭐
14
Moonshot - A simple and modular tool to evaluate and red-team any LLM application.
Elevant
⭐
14
Entity linking evaluation and analysis tool
Tieval
⭐
14
An Evaluation Framework for Temporal Information Extraction Systems
Unified_tracking_benchmark
⭐
11
An easy-to-use tool for evaluating tracking algorithms on many different benchmarks like OTB and Temple-Color
Etude Engine
⭐
11
ETUDE (Evaluation Tool for Unstructured Data and Extractions) is a Python-based tool that provides consistent evaluation options across a range of annotation schemata and corpus formats
Taint Evaluator
⭐
11
A suite of experiments for evaluating open-source binary taint trackers.
Galileo
⭐
10
🪐 A framework for distributed load testing experiments
Lapixdl
⭐
10
Python package with Deep Learning utilities for Computer Vision
Thresh
⭐
9
🌾 Universal, customizable and deployable fine-grained evaluation for text generation.
Yeast In Microstructures Dataset
⭐
8
Official and maintained implementation of the dataset paper "An Instance Segmentation Dataset of Yeast Cells in Microstructures" [EMBC 2023].
Gan Evaluator
⭐
8
A pip-installable evaluator for GANs (IS and FID). Accepts either dataloaders or individual batches. Supports on-the-fly evaluation during training. A working DCGAN SVHN demo script provided.
Ggme
⭐
8
Official repository for the ICLR 2022 paper "Evaluation Metrics for Graph Generative Models: Problems, Pitfalls, and Practical Solutions" https://openreview.net/forum?id=tBtoZYKd9n
Tool
⭐
8
Delphi-BFT automates large-scale simulations of unmodified BFT protocol implementations through the Phantom simulator given a simple experimental description. For the first time, experiments with existing BFT protocol implementations can be effortless setup, configured and fed into a simulation engine
Gval
⭐
7
A Python framework to evaluate geospatial datasets by comparing candidate and benchmark maps to compute agreement maps and statistics.
Orbis_eval
⭐
7
An Extendable Evaluation Pipeline for Named Entity Drill-Down Analysis
Redeval
⭐
6
Auditing with LLM evals for LLM applications.
Evalytics
⭐
6
HR tool to orchestrate the Performance Review Cycle of the employees of a company.
Xlingeval
⭐
5
Code and Resources for the paper, "Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries"
Evaluation Framework
⭐
5
It is an evaluation framework for evaluating and comparing graph embedding techniques
1-50 of 50 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.