Awesome Open Source

Programming Languages

Search results for python evaluation metrics

evaluation-metrics x

68 search results found

Ab3dmot ⭐ 1,511

(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"

Deepeval ⭐ 1,070

The Evaluation Framework for LLMs

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

Image Similarity Measures ⭐ 482

📈 Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP spec

Evaluate your speech-to-text system with similarity measures such as word error rate (WER)

A Neural Framework for MT Evaluation

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍

Agentops ⭐ 215

Python SDK for agent evals and observability

CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks

Lrv Instruction ⭐ 160

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

Pyrouge ⭐ 155

A Python wrapper for the ROUGE summarization evaluation package

Ner Evaluation ⭐ 153

An implementation of a full named-entity evaluation metrics based on SemEval'13 Task 9 - not at tag/token level but considering all the tokens that are part of the named-entity

Pythonrouge ⭐ 142

Python wrapper for evaluating summarization quality by ROUGE package

Tonic_validate ⭐ 128

Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.

Nervaluate ⭐ 125

Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13

Pysodevaltoolkit ⭐ 112

PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection

Person Reid Evaluation ⭐ 104

GOM：New Metric for Re-identification. 👉GOM explicitly balances the effect of performing retrieval and verification into a single unified metric.

Fast_bss_eval ⭐ 82

A fast implementation of bss_eval metrics for blind source separation

Erroranalysis_prompt ⭐ 79

🎁[ChatGPT4MTevaluation] ErrorAnalysis Prompt for MT Evaluation in ChatGPT

Continuous Eval ⭐ 78

Evaluation for LLM / RAG pipelines, ready for CI/CD

Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper

Semantic Object Accuracy For Generative Text To Image Synthesis ⭐ 74

Code for "Semantic Object Accuracy for Generative Text-to-Image Synthesis" (TPAMI 2020)

Cvpr18 Caption Eval ⭐ 66

Learning to Evaluate Image Captioning. CVPR 2018

Rankeval ⭐ 65

Official repository of RankEval: An Evaluation and Analysis Framework for Learning-to-Rank Solutions.

Summarization Eval ⭐ 55

📝 Reference-Free automatic summarization evaluation with potential hallucination detection

Athina Evals ⭐ 45

Python SDK for running evaluations on LLM generated responses

Evaluate your biometric verification models literally in seconds.

Precision Recall Distributions ⭐ 38

Assessing Generative Models via Precision and Recall (official repository)

Python client for Kolena's machine learning testing platform

Permetrics ⭐ 33

Artificial intelligence (AI, ML, DL) performance metrics implemented in Python

Computing Korean Stt Error Rates ⭐ 32

STT 한글 문장 인식기 출력 스크립트의 외자 오류율(CER), 단어 오류율(WER)을 계산하는 Python 함수 패키지

Complexly represent contents, build recommender systems, evaluate them. All in one place!

Summary Reward No Reference ⭐ 28

A reference-free metric for measuring summary quality, learned from human ratings.

Nlp Tools ⭐ 28

Useful python NLP tools (evaluation, GUI interface, tokenization)

Summary Workbench ⭐ 28

Framework for unified summarisation and evaluation of English documents using state-of-the-art models and measures.

Faster_coco_eval ⭐ 26

Continuation of an abandoned project fast-coco-eval

Framework for Interactive Evaluation of Recommender Systems

Rougescore ⭐ 24

Python implementation of ROUGE

Evaluation script for named entity recognition (NER) systems based on entity-level F1 score.

PyTorch code for FLS, FID, KID, Precision, Recall, etc. using DINOv2, InceptionV3, CLIP, etc.

Evaluation metrics for machine-composed symbolic music. Paper: "The Jazz Transformer on the Front Line: Exploring the Shortcomings of AI-Composed Music through Quantitative Measures", ISMIR 2020

Openmeva ⭐ 20

Benchmark for evaluating open-ended generation

F1 Communities ⭐ 16

A novel approach to evaluate community detection algorithms on ground truth

Open-source evaluation metric for linking Machine Learning model outputs with Business outcomes

quica is a tool to run inter coder agreement pipelines in an easy and effective ways. Multiple measures are run and results are collected in a single table than can be easily exported in Latex

GRUEN for Evaluating Linguistic Quality of Generated Text (EMNLP 2020 Findings)

Fluency ENhanced Sentence-bert Evaluation (FENSE), metric for audio caption evaluation. And Benchmark dataset AudioCaps-Eval, Clotho-Eval.

Time-series Aware Precision and Recall for Evaluating Anomaly Detection Methods

Transcendental Idealism of Planner: Evaluating Perception from Planning Perspective for Autonomous Driving (ICML 2023)

Pytorch Metrics ⭐ 13

Implementation of Evaluation Metrics for Pytorch

An image-oriented evaluation tool for image captioning systems (EMNLP-IJCNLP 2019)

Finegrainedfact ⭐ 12

Official implementation of the ACL Findings 2023 paper: Interpretable Automatic Fine-grained Inconsistency Detection in Text Summarization

Codebleu ⭐ 11

Pip compatible CodeBLEU metric implementation available for linux/macos/win

Clubmark ⭐ 10

Clubmark: a Parallel Isolation Framework for Benchmarking and Profiling of Clustering (Community Detection) Algorithms Considering Overlaps (Covers)

A python implementation of Speech intelligibility in bits (SIIB)

Embeddingbased ⭐ 10

Embedding-based evaluation metrics for dialogue generation.

Socnavbench ⭐ 10

A Grounded Simulation Testing Framework for Evaluating Social Navigation: https://arxiv.org/abs/2103.00047

Pytolemaic ⭐ 10

Toolbox for analysis of model's quality and model's description. For further details see

Topic Model Diversity ⭐ 9

A collection of topic diversity measures for topic modeling

Gan Evaluator ⭐ 8

A pip-installable evaluator for GANs (IS and FID). Accepts either dataloaders or individual batches. Supports on-the-fly evaluation during training. A working DCGAN SVHN demo script provided.

Official repository for the ICLR 2022 paper "Evaluation Metrics for Graph Generative Models: Problems, Pitfalls, and Practical Solutions" https://openreview.net/forum?id=tBtoZYKd9n

Yeast In Microstructures Dataset ⭐ 8

Official and maintained implementation of the dataset paper "An Instance Segmentation Dataset of Yeast Cells in Microstructures" [EMBC 2023].

PyTorch code for ACL 2022 paper: RoMe: A Robust Metric for Evaluating Natural Language Generation https://aclanthology.org/2022.acl-long.387/

Inpainting Evaluation Metrics ⭐ 8

The goal of this repo is to provide a common evaluation script for image inpainting tasks. It contains some commonly used image quality metrics for inpainting (e.g., L1, L2, SSIM, PSNR and LPIPS).

Codes for our paper "CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation" (ACL 2022)

Bfscore_python ⭐ 7

Boundary F1 Score - Python Implementation

Fightin Words ⭐ 7

A scikit-learn compliant implementation of Monroe et al.'s Fightin' Words analysis method.

Time Series Forecasting Using Machine Learning Algorithm ⭐ 7

Sensor data of a renowned power plant has given by a reliable source to forecast some feature. Initially the work has done with KNIME software. Now the goal is to do the prediction/forecasting with machine learning. The idea is to check the result of forecast with univariate and multivariate time series data. Regression method, Statistical method.

LeBLEU: Levenshtein/Letter-edit BLEU, N-gram-based Translation Evaluation Score for Morphologically Complex Languages

Calculating ROUGE score for Korean (Wrapper for ROUGE-1.5.5.pl script)

Classeval ⭐ 5

Evaluation of supervised predictions for two-class and multi-class classifiers

Primesrl Eval ⭐ 5

A Practical Quality Metric for Semantic Role Labeling Systems Evaluation

Chatgpt_as_nlg_evaluator ⭐ 5

Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study

Related Searches

Python Dataset (14,792)

Python Machine Learning (14,099)

Python Tensorflow (13,736)

Python Deep Learning (13,092)

Python Algorithms (10,033)

Python Testing (9,479)

Python Natural Language Processing (9,064)

Python Artificial Intelligence (8,580)

Python Pytorch (7,877)

Python Amazon Web Services (7,637)

1-68 of 68 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.