Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for evaluation metrics
evaluation-metrics
x
106 search results found
Ab3dmot
โญย
1,511
(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"
Deepeval
โญย
1,070
The Evaluation Framework for LLMs
Octis
โญย
647
OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
Rliable
โญย
588
[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.
Multi Human Parsing
โญย
564
๐ฅ๐ฅOfficial Repository for Multi-Human-Parsing (MHP)๐ฅ๐ฅ
Image Similarity Measures
โญย
482
๐ Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.
Pynlpl
โญย
466
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP spec
Jiwer
โญย
440
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
Comet
โญย
346
A Neural Framework for MT Evaluation
Specvqgan
โญย
262
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
Ranx
โญย
228
โก๏ธA Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion ๐
Agentops
โญย
215
Python SDK for agent evals and observability
Cleval
โญย
172
CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks
Lrv Instruction
โญย
160
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Pyrouge
โญย
155
A Python wrapper for the ROUGE summarization evaluation package
Ner Evaluation
โญย
153
An implementation of a full named-entity evaluation metrics based on SemEval'13 Task 9 - not at tag/token level but considering all the tokens that are part of the named-entity
Pythonrouge
โญย
142
Python wrapper for evaluating summarization quality by ROUGE package
Easse
โญย
141
Easier Automatic Sentence Simplification Evaluation
Tonic_validate
โญย
128
Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.
Nervaluate
โญย
125
Full named-entity (i.e., not tag/token) evaluation metrics based on SemEvalโ13
Generative Evaluation Prdc
โญย
121
Code base for the precision, recall, density, and coverage metrics for generative models. ICML 2020.
Pysodevaltoolkit
โญย
112
PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection
Person Reid Evaluation
โญย
104
GOM๏ผNew Metric for Re-identification. ๐GOM explicitly balances the effect of performing retrieval and verification into a single unified metric.
Fast_bss_eval
โญย
82
A fast implementation of bss_eval metrics for blind source separation
Erroranalysis_prompt
โญย
79
๐[ChatGPT4MTevaluation] ErrorAnalysis Prompt for MT Evaluation in ChatGPT
Continuous Eval
โญย
78
Evaluation for LLM / RAG pipelines, ready for CI/CD
Factcc
โญย
76
Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper
Semantic Object Accuracy For Generative Text To Image Synthesis
โญย
74
Code for "Semantic Object Accuracy for Generative Text-to-Image Synthesis" (TPAMI 2020)
Cvpr18 Caption Eval
โญย
66
Learning to Evaluate Image Captioning. CVPR 2018
Rankeval
โญย
65
Official repository of RankEval: An Evaluation and Analysis Framework for Learning-to-Rank Solutions.
Summarization Eval
โญย
55
๐ Reference-Free automatic summarization evaluation with potential hallucination detection
Athina Evals
โญย
45
Python SDK for running evaluations on LLM generated responses
Evalify
โญย
41
Evaluate your biometric verification models literally in seconds.
Precision Recall Distributions
โญย
38
Assessing Generative Models via Precision and Recall (official repository)
Kolena
โญย
37
Python client for Kolena's machine learning testing platform
Rouge
โญย
34
A Javascript implementation of the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) evaluation metric for summaries.
Permetrics
โญย
33
Artificial intelligence (AI, ML, DL) performance metrics implemented in Python
Streamingrec
โญย
33
A news recommendation evaluation framework
Computing Korean Stt Error Rates
โญย
32
STT ํ๊ธ ๋ฌธ์ฅ ์ธ์๊ธฐ ์ถ๋ ฅ ์คํฌ๋ฆฝํธ์ ์ธ์ ์ค๋ฅ์จ(CER), ๋จ์ด ์ค๋ฅ์จ(WER)์ ๊ณ์ฐํ๋ Python ํจ์ ํจํค์ง
Clayrs
โญย
31
Complexly represent contents, build recommender systems, evaluate them. All in one place!
Data Discovery Toolkit
โญย
31
A data discovery and manipulation toolset for unstructured data
Summary Workbench
โญย
28
Framework for unified summarisation and evaluation of English documents using state-of-the-art models and measures.
Nlp Tools
โญย
28
Useful python NLP tools (evaluation, GUI interface, tokenization)
Summary Reward No Reference
โญย
28
A reference-free metric for measuring summary quality, learned from human ratings.
Repsys
โญย
26
Framework for Interactive Evaluation of Recommender Systems
Faster_coco_eval
โญย
26
Continuation of an abandoned project fast-coco-eval
Students Performance Analytics
โญย
24
Students Performance Evaluation using Feature Engineering, Feature Extraction, Manipulation of Data, Data Analysis, Data Visualization and at lat applying Classification Algorithms from Machine Learning to Separate Students with different grades
Rougescore
โญย
24
Python implementation of ROUGE
Nereval
โญย
23
Evaluation script for named entity recognition (NER) systems based on entity-level F1 score.
Fld
โญย
22
PyTorch code for FLS, FID, KID, Precision, Recall, etc. using DINOv2, InceptionV3, CLIP, etc.
Musdr
โญย
21
Evaluation metrics for machine-composed symbolic music. Paper: "The Jazz Transformer on the Front Line: Exploring the Shortcomings of AI-Composed Music through Quantitative Measures", ISMIR 2020
Openmeva
โญย
20
Benchmark for evaluating open-ended generation
Zoneeval
โญย
20
Zone Evaluation: Revealing Spatial Bias in Object Detection
Iron
โญย
16
R Package for Imbalanced Regression
F1 Communities
โญย
16
A novel approach to evaluate community detection algorithms on ground truth
Guap
โญย
16
Open-source evaluation metric for linking Machine Learning model outputs with Business outcomes
Quica
โญย
15
quica is a tool to run inter coder agreement pipelines in an easy and effective ways. Multiple measures are run and results are collected in a single table than can be easily exported in Latex
Online Retail Transactions Of Uk
โญย
14
Analyzing the Online Transactions in UK and the countries who are purchase stuff from them and analyzing the reviews from them using NLP and Machine Learning
World Food Production
โญย
14
Comparing Top food and feed Producers around the globe and also seeking some interesting answers, solutions, patterns, hints and warnings through the power of Data Analysis and Data Visualization using Machine Learning.
Gruen
โญย
14
GRUEN for Evaluating Linguistic Quality of Generated Text (EMNLP 2020 Findings)
Tapr
โญย
13
Time-series Aware Precision and Recall for Evaluating Anomaly Detection Methods
Fense
โญย
13
Fluency ENhanced Sentence-bert Evaluation (FENSE), metric for audio caption evaluation. And Benchmark dataset AudioCaps-Eval, Clotho-Eval.
Pytorch Metrics
โญย
13
Implementation of Evaluation Metrics for Pytorch
Tip
โญย
13
Transcendental Idealism of Planner: Evaluating Perception from Planning Perspective for Autonomous Driving (ICML 2023)
Twitter Sentiment Analysis
โญย
12
It is a Natural Language Processing Problem where Sentiment Analysis is done by Classifying the Positive tweets from negative tweets by machine learning models for classification, text mining, text analysis, data analysis and data visualization
Capeval
โญย
12
An image-oriented evaluation tool for image captioning systems (EMNLP-IJCNLP 2019)
Finegrainedfact
โญย
12
Official implementation of the ACL Findings 2023 paper: Interpretable Automatic Fine-grained Inconsistency Detection in Text Summarization
Codebleu
โญย
11
Pip compatible CodeBLEU metric implementation available for linux/macos/win
Restaurant Reviews Analysis
โญย
10
Using Natural Language Processing and Bag of Words for feature extraction for sentiment analysis of the customers visited in the Restaurant and at last using Classification algorithm to separate Positive and Negative Sentiments.
Pytolemaic
โญย
10
Toolbox for analysis of model's quality and model's description. For further details see
Clubmark
โญย
10
Clubmark: a Parallel Isolation Framework for Benchmarking and Profiling of Clustering (Community Detection) Algorithms Considering Overlaps (Covers)
Ml_evaluation_metrics
โญย
10
Landscape of ML/DL performance evaluation metrics
Insurance Claim Prediction
โญย
10
In this Data set we are Predicting the Insurance Claim by each user, Machine Learning algorithms for Regression analysis are used and Data Visualization are also performed to support Analysis.
Socnavbench
โญย
10
A Grounded Simulation Testing Framework for Evaluating Social Navigation: https://arxiv.org/abs/2103.00047
Pysiib
โญย
10
A python implementation of Speech intelligibility in bits (SIIB)
Embeddingbased
โญย
10
Embedding-based evaluation metrics for dialogue generation.
Topic Model Diversity
โญย
9
A collection of topic diversity measures for topic modeling
Gan Evaluator
โญย
8
A pip-installable evaluator for GANs (IS and FID). Accepts either dataloaders or individual batches. Supports on-the-fly evaluation during training. A working DCGAN SVHN demo script provided.
Yeast In Microstructures Dataset
โญย
8
Official and maintained implementation of the dataset paper "An Instance Segmentation Dataset of Yeast Cells in Microstructures" [EMBC 2023].
Graduate Admissions Analysis
โญย
8
Analyzing the Factors on which Graduates get Admissions in Abroad and Visualizing some of the most intriguing and interesting patterns followed onto it using Data Analysis and Data Visualizations Using Machine Learning.
Inpainting Evaluation Metrics
โญย
8
The goal of this repo is to provide a common evaluation script for image inpainting tasks. It contains some commonly used image quality metrics for inpainting (e.g., L1, L2, SSIM, PSNR and LPIPS).
Rome
โญย
8
PyTorch code for ACL 2022 paper: RoMe: A Robust Metric for Evaluating Natural Language Generation https://aclanthology.org/2022.acl-long.387/
Ggme
โญย
8
Official repository for the ICLR 2022 paper "Evaluation Metrics for Graph Generative Models: Problems, Pitfalls, and Practical Solutions" https://openreview.net/forum?id=tBtoZYKd9n
Time Series Forecasting Using Machine Learning Algorithm
โญย
7
Sensor data of a renowned power plant has given by a reliable source to forecast some feature. Initially the work has done with KNIME software. Now the goal is to do the prediction/forecasting with machine learning. The idea is to check the result of forecast with univariate and multivariate time series data. Regression method, Statistical method.
Clevr
โญย
7
Clustering and Link Prediction Evaluation in R
Korouge
โญย
7
Calculating ROUGE score for Korean (Wrapper for ROUGE-1.5.5.pl script)
Fightin Words
โญย
7
A scikit-learn compliant implementation of Monroe et al.'s Fightin' Words analysis method.
Ctrleval
โญย
7
Codes for our paper "CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation" (ACL 2022)
Bfscore_python
โญย
7
Boundary F1 Score - Python Implementation
Lebleu
โญย
7
LeBLEU: Levenshtein/Letter-edit BLEU, N-gram-based Translation Evaluation Score for Morphologically Complex Languages
Big Mart Sales Prediction
โญย
7
Using Machine Learning Algorithms for Regression Analysis to predict the sales pattern and Using Data Analysis and Data Visualizations to Support it.
Employee Reviews
โญย
7
This is Project which contains Data Visualization, EDA, Machine Learning Modelling for Checking the Sentiments.
Lips
โญย
7
The benchmark platform for power networks
Cr Nmt
โญย
6
Code and data realease for "Revisiting Commonsense Reasoning in Machine Translation: Training, Evaluation and Challenge"
Amazon Alexa Reviews
โญย
6
Using Natural Language Processing, Data Visualizations and Classification Algorithms of Machine Learning
Fairsumm
โญย
6
Code and Data for paper: Fair Abstractive Summarization of Diverse Perspectives
Learning
โญย
6
Evaluation metrics and essential machine learning for Haskell
Far
โญย
6
Facet-Aware Evaluation for Extractive Summarization, ACL 2020
Boston House Price Predictions
โญย
6
The most basic data set available to practice the concepts of regression analysis and explore the most basic concepts of machine learning
Eval Metrics
โญย
5
Evaluation metrics for machine learning
1-100 of 106 search results
Next >
Privacy
ย |ย
About
ย |ย
Terms
ย |ย
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source.ย All rights reserved.