Awesome Open Source

Programming Languages

Search results for evaluation metrics

evaluation-metrics x

106 search results found

Ab3dmot ⭐ 1,511

(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"

Deepeval ⭐ 1,070

The Evaluation Framework for LLMs

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

Rliable ⭐ 588

[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.

Multi Human Parsing ⭐ 564

🔥🔥Official Repository for Multi-Human-Parsing (MHP)🔥🔥

Image Similarity Measures ⭐ 482

📈 Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP spec

Evaluate your speech-to-text system with similarity measures such as word error rate (WER)

A Neural Framework for MT Evaluation

Specvqgan ⭐ 262

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍

Agentops ⭐ 215

Python SDK for agent evals and observability

CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks

Lrv Instruction ⭐ 160

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

Pyrouge ⭐ 155

A Python wrapper for the ROUGE summarization evaluation package

Ner Evaluation ⭐ 153

An implementation of a full named-entity evaluation metrics based on SemEval'13 Task 9 - not at tag/token level but considering all the tokens that are part of the named-entity

Pythonrouge ⭐ 142

Python wrapper for evaluating summarization quality by ROUGE package

Easier Automatic Sentence Simplification Evaluation

Tonic_validate ⭐ 128

Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.

Nervaluate ⭐ 125

Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13

Generative Evaluation Prdc ⭐ 121

Code base for the precision, recall, density, and coverage metrics for generative models. ICML 2020.

Pysodevaltoolkit ⭐ 112

PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection

Person Reid Evaluation ⭐ 104

GOM：New Metric for Re-identification. 👉GOM explicitly balances the effect of performing retrieval and verification into a single unified metric.

Fast_bss_eval ⭐ 82

A fast implementation of bss_eval metrics for blind source separation

Erroranalysis_prompt ⭐ 79

🎁[ChatGPT4MTevaluation] ErrorAnalysis Prompt for MT Evaluation in ChatGPT

Continuous Eval ⭐ 78

Evaluation for LLM / RAG pipelines, ready for CI/CD

Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper

Semantic Object Accuracy For Generative Text To Image Synthesis ⭐ 74

Code for "Semantic Object Accuracy for Generative Text-to-Image Synthesis" (TPAMI 2020)

Cvpr18 Caption Eval ⭐ 66

Learning to Evaluate Image Captioning. CVPR 2018

Rankeval ⭐ 65

Official repository of RankEval: An Evaluation and Analysis Framework for Learning-to-Rank Solutions.

Summarization Eval ⭐ 55

📝 Reference-Free automatic summarization evaluation with potential hallucination detection

Athina Evals ⭐ 45

Python SDK for running evaluations on LLM generated responses

Evaluate your biometric verification models literally in seconds.

Precision Recall Distributions ⭐ 38

Assessing Generative Models via Precision and Recall (official repository)

Python client for Kolena's machine learning testing platform

A Javascript implementation of the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) evaluation metric for summaries.

Permetrics ⭐ 33

Artificial intelligence (AI, ML, DL) performance metrics implemented in Python

Streamingrec ⭐ 33

A news recommendation evaluation framework

Computing Korean Stt Error Rates ⭐ 32

STT 한글 문장 인식기 출력 스크립트의 외자 오류율(CER), 단어 오류율(WER)을 계산하는 Python 함수 패키지

Complexly represent contents, build recommender systems, evaluate them. All in one place!

Data Discovery Toolkit ⭐ 31

A data discovery and manipulation toolset for unstructured data

Summary Workbench ⭐ 28

Framework for unified summarisation and evaluation of English documents using state-of-the-art models and measures.

Nlp Tools ⭐ 28

Useful python NLP tools (evaluation, GUI interface, tokenization)

Summary Reward No Reference ⭐ 28

A reference-free metric for measuring summary quality, learned from human ratings.

Framework for Interactive Evaluation of Recommender Systems

Faster_coco_eval ⭐ 26

Continuation of an abandoned project fast-coco-eval

Students Performance Analytics ⭐ 24

Students Performance Evaluation using Feature Engineering, Feature Extraction, Manipulation of Data, Data Analysis, Data Visualization and at lat applying Classification Algorithms from Machine Learning to Separate Students with different grades

Rougescore ⭐ 24

Python implementation of ROUGE

Evaluation script for named entity recognition (NER) systems based on entity-level F1 score.

PyTorch code for FLS, FID, KID, Precision, Recall, etc. using DINOv2, InceptionV3, CLIP, etc.

Evaluation metrics for machine-composed symbolic music. Paper: "The Jazz Transformer on the Front Line: Exploring the Shortcomings of AI-Composed Music through Quantitative Measures", ISMIR 2020

Openmeva ⭐ 20

Benchmark for evaluating open-ended generation

Zoneeval ⭐ 20

Zone Evaluation: Revealing Spatial Bias in Object Detection

R Package for Imbalanced Regression

F1 Communities ⭐ 16

A novel approach to evaluate community detection algorithms on ground truth

Open-source evaluation metric for linking Machine Learning model outputs with Business outcomes

quica is a tool to run inter coder agreement pipelines in an easy and effective ways. Multiple measures are run and results are collected in a single table than can be easily exported in Latex

Online Retail Transactions Of Uk ⭐ 14

Analyzing the Online Transactions in UK and the countries who are purchase stuff from them and analyzing the reviews from them using NLP and Machine Learning

World Food Production ⭐ 14

Comparing Top food and feed Producers around the globe and also seeking some interesting answers, solutions, patterns, hints and warnings through the power of Data Analysis and Data Visualization using Machine Learning.

GRUEN for Evaluating Linguistic Quality of Generated Text (EMNLP 2020 Findings)

Time-series Aware Precision and Recall for Evaluating Anomaly Detection Methods

Fluency ENhanced Sentence-bert Evaluation (FENSE), metric for audio caption evaluation. And Benchmark dataset AudioCaps-Eval, Clotho-Eval.

Pytorch Metrics ⭐ 13

Implementation of Evaluation Metrics for Pytorch

Transcendental Idealism of Planner: Evaluating Perception from Planning Perspective for Autonomous Driving (ICML 2023)

Twitter Sentiment Analysis ⭐ 12

It is a Natural Language Processing Problem where Sentiment Analysis is done by Classifying the Positive tweets from negative tweets by machine learning models for classification, text mining, text analysis, data analysis and data visualization

An image-oriented evaluation tool for image captioning systems (EMNLP-IJCNLP 2019)

Finegrainedfact ⭐ 12

Official implementation of the ACL Findings 2023 paper: Interpretable Automatic Fine-grained Inconsistency Detection in Text Summarization

Codebleu ⭐ 11

Pip compatible CodeBLEU metric implementation available for linux/macos/win

Restaurant Reviews Analysis ⭐ 10

Using Natural Language Processing and Bag of Words for feature extraction for sentiment analysis of the customers visited in the Restaurant and at last using Classification algorithm to separate Positive and Negative Sentiments.

Pytolemaic ⭐ 10

Toolbox for analysis of model's quality and model's description. For further details see

Clubmark ⭐ 10

Clubmark: a Parallel Isolation Framework for Benchmarking and Profiling of Clustering (Community Detection) Algorithms Considering Overlaps (Covers)

Ml_evaluation_metrics ⭐ 10

Landscape of ML/DL performance evaluation metrics

Insurance Claim Prediction ⭐ 10

In this Data set we are Predicting the Insurance Claim by each user, Machine Learning algorithms for Regression analysis are used and Data Visualization are also performed to support Analysis.

Socnavbench ⭐ 10

A Grounded Simulation Testing Framework for Evaluating Social Navigation: https://arxiv.org/abs/2103.00047

A python implementation of Speech intelligibility in bits (SIIB)

Embeddingbased ⭐ 10

Embedding-based evaluation metrics for dialogue generation.

Topic Model Diversity ⭐ 9

A collection of topic diversity measures for topic modeling

Gan Evaluator ⭐ 8

A pip-installable evaluator for GANs (IS and FID). Accepts either dataloaders or individual batches. Supports on-the-fly evaluation during training. A working DCGAN SVHN demo script provided.

Yeast In Microstructures Dataset ⭐ 8

Official and maintained implementation of the dataset paper "An Instance Segmentation Dataset of Yeast Cells in Microstructures" [EMBC 2023].

Graduate Admissions Analysis ⭐ 8

Analyzing the Factors on which Graduates get Admissions in Abroad and Visualizing some of the most intriguing and interesting patterns followed onto it using Data Analysis and Data Visualizations Using Machine Learning.

Inpainting Evaluation Metrics ⭐ 8

The goal of this repo is to provide a common evaluation script for image inpainting tasks. It contains some commonly used image quality metrics for inpainting (e.g., L1, L2, SSIM, PSNR and LPIPS).

PyTorch code for ACL 2022 paper: RoMe: A Robust Metric for Evaluating Natural Language Generation https://aclanthology.org/2022.acl-long.387/

Official repository for the ICLR 2022 paper "Evaluation Metrics for Graph Generative Models: Problems, Pitfalls, and Practical Solutions" https://openreview.net/forum?id=tBtoZYKd9n

Time Series Forecasting Using Machine Learning Algorithm ⭐ 7

Sensor data of a renowned power plant has given by a reliable source to forecast some feature. Initially the work has done with KNIME software. Now the goal is to do the prediction/forecasting with machine learning. The idea is to check the result of forecast with univariate and multivariate time series data. Regression method, Statistical method.

Clustering and Link Prediction Evaluation in R

Calculating ROUGE score for Korean (Wrapper for ROUGE-1.5.5.pl script)

Fightin Words ⭐ 7

A scikit-learn compliant implementation of Monroe et al.'s Fightin' Words analysis method.

Codes for our paper "CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation" (ACL 2022)

Bfscore_python ⭐ 7

Boundary F1 Score - Python Implementation

LeBLEU: Levenshtein/Letter-edit BLEU, N-gram-based Translation Evaluation Score for Morphologically Complex Languages

Big Mart Sales Prediction ⭐ 7

Using Machine Learning Algorithms for Regression Analysis to predict the sales pattern and Using Data Analysis and Data Visualizations to Support it.

Employee Reviews ⭐ 7

This is Project which contains Data Visualization, EDA, Machine Learning Modelling for Checking the Sentiments.

The benchmark platform for power networks

Code and data realease for "Revisiting Commonsense Reasoning in Machine Translation: Training, Evaluation and Challenge"

Amazon Alexa Reviews ⭐ 6

Using Natural Language Processing, Data Visualizations and Classification Algorithms of Machine Learning

Code and Data for paper: Fair Abstractive Summarization of Diverse Perspectives

Evaluation metrics and essential machine learning for Haskell

Facet-Aware Evaluation for Extractive Summarization, ACL 2020

Boston House Price Predictions ⭐ 6

The most basic data set available to practice the concepts of regression analysis and explore the most basic concepts of machine learning

Eval Metrics ⭐ 5

Evaluation metrics for machine learning

1-100 of 106 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.