Awesome Open Source

Programming Languages

Search results for evaluation framework

evaluation-framework x

50 search results found

Lm Evaluation Harness ⭐ 3,768

A framework for few-shot evaluation of language models.

Promptfoo ⭐ 1,785

Test your prompts, models, RAGs. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality. LLM evals for OpenAI/Azure GPT, Anthropic Claude, VertexAI Gemini, Ollama, Local & private models like Mistral/Mixtral/Llama with CI/CD

Deepeval ⭐ 1,070

The Evaluation Framework for LLMs

Recsys2019_deeplearning_evaluation ⭐ 871

This is the repository of our article published in RecSys 2019 "Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches" and of several follow-up studies.

A research library for automating experiments on Deep Graph Networks

AI Data Management & Evaluation Platform

Expressive ⭐ 146

Expressive is a cross-platform expression parsing and evaluation framework. The cross-platform nature is achieved through compiling for .NET Standard so it will run on practically any platform.

Tonic_validate ⭐ 128

Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.

Pysodevaltoolkit ⭐ 112

PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection

Crowdflow ⭐ 92

Optical Flow Dataset and Benchmark for Visual Crowd Analysis

Continuous Eval ⭐ 78

Evaluation for LLM / RAG pipelines, ready for CI/CD

Lm Evaluation ⭐ 69

Evaluation suite for large-scale language models.

Sordi Ai Evaluation Gui ⭐ 68

This repository allows you to evaluate a trained computer vision model and get general information and evaluation metrics with little configuration.

Rankeval ⭐ 65

Official repository of RankEval: An Evaluation and Analysis Framework for Learning-to-Rank Solutions.

BIRL: Benchmark on Image Registration methods with Landmark validations

Dialogentailment ⭐ 62

The implementation of the paper "Evaluating Coherence in Dialogue Systems using Entailment"

Vectory provides a collection of tools to track and compare embedding versions.

Athina Evals ⭐ 45

Python SDK for running evaluations on LLM generated responses

OD-test: A Less Biased Evaluation of Out-of-Distribution (Outlier) Detectors (PyTorch)

Evaluate your biometric verification models literally in seconds.

Simulator for training and evaluation of Recommender Systems

Python client for Kolena's machine learning testing platform

Framework to evaluate Trajectory Classification Algorithms

Codefuse Evaluation ⭐ 37

Industrial-level evaluation benchmarks for Coding LLMs in the full life-cycle of AI native software developing.企业级代码大模型评测体系,持续开放中

Evaluation Framework for Dependency Analysis (EFDA)

Train, evaluate, and optimize implicit feedback-based recommender systems.

Framework for Interactive Evaluation of Recommender Systems

A high-level Python framework to evaluate the skill of geospatial datasets by comparing candidates to benchmark maps producing agreement maps and metrics.

Fast_prototype ⭐ 15

This is a machine learning framework that enables developers to iterate fast over different ML architecture designs.

The Core Reinforcement Learning library is intended to enable scalable deep reinforcement learning experimentation in a manner extensible to new simulations and new ways for the learning agents to interact with them. The hope is that this makes RL research easier by removing lock-in to particular simulations.The work is released under the follow APRS approval. Initial release of CoRL - Part #1 -Approved on 2022-05-2024 12:08:51 - PA Approval # [AFRL-2022-2455]" Documentation https://act3-ace.g

quica is a tool to run inter coder agreement pipelines in an easy and effective ways. Multiple measures are run and results are collected in a single table than can be easily exported in Latex

Moonshot ⭐ 14

Moonshot - A simple and modular tool to evaluate and red-team any LLM application.

Entity linking evaluation and analysis tool

An Evaluation Framework for Temporal Information Extraction Systems

Unified_tracking_benchmark ⭐ 11

An easy-to-use tool for evaluating tracking algorithms on many different benchmarks like OTB and Temple-Color

Etude Engine ⭐ 11

ETUDE (Evaluation Tool for Unstructured Data and Extractions) is a Python-based tool that provides consistent evaluation options across a range of annotation schemata and corpus formats

Taint Evaluator ⭐ 11

A suite of experiments for evaluating open-source binary taint trackers.

🪐 A framework for distributed load testing experiments

Python package with Deep Learning utilities for Computer Vision

🌾 Universal, customizable and deployable fine-grained evaluation for text generation.

Yeast In Microstructures Dataset ⭐ 8

Official and maintained implementation of the dataset paper "An Instance Segmentation Dataset of Yeast Cells in Microstructures" [EMBC 2023].

Gan Evaluator ⭐ 8

A pip-installable evaluator for GANs (IS and FID). Accepts either dataloaders or individual batches. Supports on-the-fly evaluation during training. A working DCGAN SVHN demo script provided.

Official repository for the ICLR 2022 paper "Evaluation Metrics for Graph Generative Models: Problems, Pitfalls, and Practical Solutions" https://openreview.net/forum?id=tBtoZYKd9n

Delphi-BFT automates large-scale simulations of unmodified BFT protocol implementations through the Phantom simulator given a simple experimental description. For the first time, experiments with existing BFT protocol implementations can be effortless setup, configured and fed into a simulation engine

A Python framework to evaluate geospatial datasets by comparing candidate and benchmark maps to compute agreement maps and statistics.

Orbis_eval ⭐ 7

An Extendable Evaluation Pipeline for Named Entity Drill-Down Analysis

Auditing with LLM evals for LLM applications.

Evalytics ⭐ 6

HR tool to orchestrate the Performance Review Cycle of the employees of a company.

Xlingeval ⭐ 5

Code and Resources for the paper, "Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries"

Evaluation Framework ⭐ 5

It is an evaluation framework for evaluating and comparing graph embedding techniques

1-50 of 50 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.