Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Text2vec | 678 | 1 | 3 months ago | 21 | May 17, 2022 | 2 | apache-2.0 | Python | ||
text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。 | ||||||||||
Openhownet | 422 | a year ago | 13 | December 16, 2021 | mit | Python | ||||
Core Data of HowNet and OpenHowNet Python API | ||||||||||
Chinesests | 286 | 5 years ago | 5 | |||||||
中文文本语义相似度(Chinese Semantic Text Similarity)语料库建设 | ||||||||||
Macropodus | 256 | 1 | 2 years ago | 7 | December 25, 2020 | 1 | mit | Python | ||
自然语言处理工具Macropodus,基于Albert+BiLSTM+CRF深度学习网络架构,中文分词,词性标注,命名实体识别,新词发现,关键词,文本摘要,文本相似度,科学计算器,中文数字阿拉伯数字(罗马数字)转换,中文繁简转换,拼音转换。tookit(tool) of NLP,CWS(chinese word segnment),POS(Part-Of-Speech Tagging),NER(name entity recognition),Find(new words discovery),Keyword(keyword extraction),Summarize(text summarization),Sim(text similarity),Calculate(scientific calculator),Chi2num(chinese number to arabic number) | ||||||||||
Textanalyzer | 149 | 5 years ago | 4 | Java | ||||||
A text analyzer which is based on machine learning,statistics and dictionaries that can analyze text. So far, it supports hot word extracting, text classification, part of speech tagging, named entity recognition, chinese word segment, extracting address, synonym, text clustering, word2vec model, edit distance, chinese word segment, sentence similarity,word sentiment tendency, name recognition, idiom recognition, placename recognition, organization recognition, traditional chinese recognition, pinyin transform. | ||||||||||
Chinese Sentence Similarity Task | 129 | 2 years ago | ||||||||
中文问题句子相似度计算比赛及方案汇总 | ||||||||||
Jwe | 86 | 4 years ago | mit | C | ||||||
Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components | ||||||||||
Wordmultisensedisambiguation | 66 | 4 years ago | 2 | Python | ||||||
WordMultiSenseDisambiguation, chinese multi-wordsense disambiguation based on online bake knowledge base and semantic embedding similarity compute,基于百科知识库的中文词语多词义/义项获取与特定句子词语语义消歧. | ||||||||||
Cn Words | 55 | 4 years ago | Jupyter Notebook | |||||||
Get Similar Chinese Words and Sentences | ||||||||||
Fuzzychinese | 52 | 22 days ago | 3 | April 29, 2019 | 1 | bsd-3-clause | Python | |||
A small package to fuzzy match chinese words |
STS 中文文本语义相似度语料库建设
文本语义相似度(Semantic Text Similarity)是自然语言处理处理中的基本问题。
相似度值:[0,5],5:相似度最高(意思一样),0:相似度最低(语义相反或不相干)
应用范围:QA、自动客服、搜索引擎、语义理解、自动阅卷......
项目意义:目前英文sts语料训练数据较为丰富,中文sts(Chinese Semantic Text Similarity)语料很少,而语料是文本进行深度学习的基本起点。
项目实施起始日期:2016-06-06 06:06:06 0 0 131 66
如有引用或使用本训练集请注明作者信息: 唐善成, 白云悦, 马付玉. 中文语义相似度训练集. 西安科技大学.2016. IAdmireu/ChineseSTS
Tang Shancheng, Bai Yunyue, Ma Fuyu. Chinese Semantic Text Similarity Trainning Dataset. Xi'an University of Science and Technology.2016. IAdmireu/ChineseSTS