Cos960

COS960: A Chinese Word Similarity Dataset of 960 Word Pairs
Alternatives To Cos960
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Perceptualsimilarity2,61823 months ago5August 25, 202142bsd-2-clausePython
LPIPS metric. pip install lpips
Fastdup794
10 hours ago16otherPython
fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.
Facial Similarity With Siamese Networks In Pytorch620
3 years ago8mitJupyter Notebook
Implementing Siamese networks with a contrastive loss for similarity learning
Dataset Sts542
5 years ago23Python
Semantic Text Similarity Dataset Hub
Siamese Lstm172
5 years ago5Python
Siamese LSTM for evaluating semantic similarity between sentences of the Quora Question Pairs Dataset.
Datagene170
a year ago3May 10, 2020Jupyter Notebook
DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)
Mp Cnn Torch106
5 years ago1Lua
Multi-Perspective Convolutional Neural Networks for modeling textual similarity (He et al., EMNLP 2015)
Metaod85
a year ago7September 28, 20201bsd-2-clause
Automating Outlier Detection via Meta-Learning (Code, API, and Contribution Instructions)
Japanesewordsimilaritydataset84
a year agoPython
Japanese Word Similarity Dataset
Multi Camera Vehicle Tracking And Reidentification78
5 years ago3Python
This repository contains our source code of Track 3 in the NVIDIA AI City Challenge Workshop at CVPR 2018. Please find the full source code of both Track 1 and Track 3 here ->
Alternatives To Cos960
Select To Compare


Alternative Project Comparisons
Readme

COS960

COS960 is a Chinese word similarity dataset of 960 word pairs. Each pair of words is annotated by 15 native speakers with a similarity score which reflects true similarity. The 960 word pairs are further divided into 3 groups according to their Part Of Speech tags, including 480 pairs of nouns, 240 pairs of verbs and 240 pairs of adjectives.

Usage

To use COS960 to test your word embedding, use command

python correlation_calcu.py {VECTOR_FILE}

Dataset

The data in the files is formulated as

[Word1] [Word2] [Average] [Annotator1] ... [Annotator15]

小心谨慎  谨慎小心     4.0         4      ...       4 

Cite

If you use the dataset, please cite this:

@article{huang2019COS960,
Author = {Junjie Huang and Fanchao Qi and Chenghao Yang and Zhiyuan Liu and Maosong Sun},
Title = {{COS960: A Chinese Word Similarity Dataset of 960 Word Pairs}},
journal={arXiv preprint arXiv:1906.00247},
Year = {2019},
}
Popular Dataset Projects
Popular Similarity Projects
Popular Data Processing Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
Dataset
Chinese
Similarity