Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python evaluation metrics
evaluation-metrics
x
python
x
68 search results found
Ab3dmot
⭐
1,511
(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"
Deepeval
⭐
1,070
The Evaluation Framework for LLMs
Octis
⭐
647
OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
Image Similarity Measures
⭐
482
📈 Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.
Pynlpl
⭐
466
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP spec
Jiwer
⭐
440
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
Comet
⭐
346
A Neural Framework for MT Evaluation
Ranx
⭐
228
⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍
Agentops
⭐
215
Python SDK for agent evals and observability
Cleval
⭐
172
CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks
Lrv Instruction
⭐
160
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Pyrouge
⭐
155
A Python wrapper for the ROUGE summarization evaluation package
Ner Evaluation
⭐
153
An implementation of a full named-entity evaluation metrics based on SemEval'13 Task 9 - not at tag/token level but considering all the tokens that are part of the named-entity
Pythonrouge
⭐
142
Python wrapper for evaluating summarization quality by ROUGE package
Tonic_validate
⭐
128
Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.
Nervaluate
⭐
125
Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13
Pysodevaltoolkit
⭐
112
PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection
Person Reid Evaluation
⭐
104
GOM:New Metric for Re-identification. 👉GOM explicitly balances the effect of performing retrieval and verification into a single unified metric.
Fast_bss_eval
⭐
82
A fast implementation of bss_eval metrics for blind source separation
Erroranalysis_prompt
⭐
79
🎁[ChatGPT4MTevaluation] ErrorAnalysis Prompt for MT Evaluation in ChatGPT
Continuous Eval
⭐
78
Evaluation for LLM / RAG pipelines, ready for CI/CD
Factcc
⭐
76
Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper
Semantic Object Accuracy For Generative Text To Image Synthesis
⭐
74
Code for "Semantic Object Accuracy for Generative Text-to-Image Synthesis" (TPAMI 2020)
Cvpr18 Caption Eval
⭐
66
Learning to Evaluate Image Captioning. CVPR 2018
Rankeval
⭐
65
Official repository of RankEval: An Evaluation and Analysis Framework for Learning-to-Rank Solutions.
Summarization Eval
⭐
55
📝 Reference-Free automatic summarization evaluation with potential hallucination detection
Athina Evals
⭐
45
Python SDK for running evaluations on LLM generated responses
Evalify
⭐
41
Evaluate your biometric verification models literally in seconds.
Precision Recall Distributions
⭐
38
Assessing Generative Models via Precision and Recall (official repository)
Kolena
⭐
37
Python client for Kolena's machine learning testing platform
Permetrics
⭐
33
Artificial intelligence (AI, ML, DL) performance metrics implemented in Python
Computing Korean Stt Error Rates
⭐
32
STT 한글 문장 인식기 출력 스크립트의 외자 오류율(CER), 단어 오류율(WER)을 계산하는 Python 함수 패키지
Clayrs
⭐
31
Complexly represent contents, build recommender systems, evaluate them. All in one place!
Summary Reward No Reference
⭐
28
A reference-free metric for measuring summary quality, learned from human ratings.
Nlp Tools
⭐
28
Useful python NLP tools (evaluation, GUI interface, tokenization)
Summary Workbench
⭐
28
Framework for unified summarisation and evaluation of English documents using state-of-the-art models and measures.
Faster_coco_eval
⭐
26
Continuation of an abandoned project fast-coco-eval
Repsys
⭐
26
Framework for Interactive Evaluation of Recommender Systems
Rougescore
⭐
24
Python implementation of ROUGE
Nereval
⭐
23
Evaluation script for named entity recognition (NER) systems based on entity-level F1 score.
Fld
⭐
22
PyTorch code for FLS, FID, KID, Precision, Recall, etc. using DINOv2, InceptionV3, CLIP, etc.
Musdr
⭐
21
Evaluation metrics for machine-composed symbolic music. Paper: "The Jazz Transformer on the Front Line: Exploring the Shortcomings of AI-Composed Music through Quantitative Measures", ISMIR 2020
Openmeva
⭐
20
Benchmark for evaluating open-ended generation
F1 Communities
⭐
16
A novel approach to evaluate community detection algorithms on ground truth
Guap
⭐
16
Open-source evaluation metric for linking Machine Learning model outputs with Business outcomes
Quica
⭐
15
quica is a tool to run inter coder agreement pipelines in an easy and effective ways. Multiple measures are run and results are collected in a single table than can be easily exported in Latex
Gruen
⭐
14
GRUEN for Evaluating Linguistic Quality of Generated Text (EMNLP 2020 Findings)
Fense
⭐
13
Fluency ENhanced Sentence-bert Evaluation (FENSE), metric for audio caption evaluation. And Benchmark dataset AudioCaps-Eval, Clotho-Eval.
Tapr
⭐
13
Time-series Aware Precision and Recall for Evaluating Anomaly Detection Methods
Tip
⭐
13
Transcendental Idealism of Planner: Evaluating Perception from Planning Perspective for Autonomous Driving (ICML 2023)
Pytorch Metrics
⭐
13
Implementation of Evaluation Metrics for Pytorch
Capeval
⭐
12
An image-oriented evaluation tool for image captioning systems (EMNLP-IJCNLP 2019)
Finegrainedfact
⭐
12
Official implementation of the ACL Findings 2023 paper: Interpretable Automatic Fine-grained Inconsistency Detection in Text Summarization
Codebleu
⭐
11
Pip compatible CodeBLEU metric implementation available for linux/macos/win
Clubmark
⭐
10
Clubmark: a Parallel Isolation Framework for Benchmarking and Profiling of Clustering (Community Detection) Algorithms Considering Overlaps (Covers)
Pysiib
⭐
10
A python implementation of Speech intelligibility in bits (SIIB)
Embeddingbased
⭐
10
Embedding-based evaluation metrics for dialogue generation.
Socnavbench
⭐
10
A Grounded Simulation Testing Framework for Evaluating Social Navigation: https://arxiv.org/abs/2103.00047
Pytolemaic
⭐
10
Toolbox for analysis of model's quality and model's description. For further details see
Topic Model Diversity
⭐
9
A collection of topic diversity measures for topic modeling
Gan Evaluator
⭐
8
A pip-installable evaluator for GANs (IS and FID). Accepts either dataloaders or individual batches. Supports on-the-fly evaluation during training. A working DCGAN SVHN demo script provided.
Ggme
⭐
8
Official repository for the ICLR 2022 paper "Evaluation Metrics for Graph Generative Models: Problems, Pitfalls, and Practical Solutions" https://openreview.net/forum?id=tBtoZYKd9n
Yeast In Microstructures Dataset
⭐
8
Official and maintained implementation of the dataset paper "An Instance Segmentation Dataset of Yeast Cells in Microstructures" [EMBC 2023].
Rome
⭐
8
PyTorch code for ACL 2022 paper: RoMe: A Robust Metric for Evaluating Natural Language Generation https://aclanthology.org/2022.acl-long.387/
Inpainting Evaluation Metrics
⭐
8
The goal of this repo is to provide a common evaluation script for image inpainting tasks. It contains some commonly used image quality metrics for inpainting (e.g., L1, L2, SSIM, PSNR and LPIPS).
Ctrleval
⭐
7
Codes for our paper "CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation" (ACL 2022)
Bfscore_python
⭐
7
Boundary F1 Score - Python Implementation
Fightin Words
⭐
7
A scikit-learn compliant implementation of Monroe et al.'s Fightin' Words analysis method.
Time Series Forecasting Using Machine Learning Algorithm
⭐
7
Sensor data of a renowned power plant has given by a reliable source to forecast some feature. Initially the work has done with KNIME software. Now the goal is to do the prediction/forecasting with machine learning. The idea is to check the result of forecast with univariate and multivariate time series data. Regression method, Statistical method.
Lebleu
⭐
7
LeBLEU: Levenshtein/Letter-edit BLEU, N-gram-based Translation Evaluation Score for Morphologically Complex Languages
Korouge
⭐
7
Calculating ROUGE score for Korean (Wrapper for ROUGE-1.5.5.pl script)
Classeval
⭐
5
Evaluation of supervised predictions for two-class and multi-class classifiers
Primesrl Eval
⭐
5
A Practical Quality Metric for Semantic Role Labeling Systems Evaluation
Chatgpt_as_nlg_evaluator
⭐
5
Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study
Related Searches
Python Dataset (14,792)
Python Machine Learning (14,099)
Python Tensorflow (13,736)
Python Deep Learning (13,092)
Python Algorithms (10,033)
Python Testing (9,479)
Python Natural Language Processing (9,064)
Python Artificial Intelligence (8,580)
Python Pytorch (7,877)
Python Amazon Web Services (7,637)
1-68 of 68 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.