Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Chinese Xinhua | 10,425 | 4 months ago | 30 | mit | Python | |||||
:orange_book: 中华新华字典数据库。包括歇后语,成语,词语,汉字。 | ||||||||||
Nlp_chinese_corpus | 8,344 | a year ago | 20 | mit | ||||||
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP | ||||||||||
Linly | 2,964 | 3 months ago | 107 | Python | ||||||
Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集 | ||||||||||
Mnbvc | 2,533 | 3 months ago | 18 | mit | ||||||
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。 | ||||||||||
Information Extraction Chinese | 2,086 | a year ago | 118 | Python | ||||||
Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT 中文实体识别与关系提取 | ||||||||||
Thulac Python | 1,341 | 16 | 3 | 4 years ago | 11 | November 07, 2022 | 70 | mit | Python | |
An Efficient Lexical Analyzer for Chinese | ||||||||||
Chinesenlp | 1,329 | 3 years ago | 3 | HTML | ||||||
Datasets, SOTA results of every fields of Chinese NLP | ||||||||||
Zhparser | 627 | 3 months ago | 12 | other | C | |||||
zhparser is a PostgreSQL extension for full-text search of Chinese language | ||||||||||
Thulac | 611 | 3 years ago | 27 | mit | C++ | |||||
An Efficient Lexical Analyzer for Chinese | ||||||||||
Chinese_models_for_spacy | 498 | 4 years ago | 8 | mit | Jupyter Notebook | |||||
SpaCy 中文模型 | Models for SpaCy that support Chinese |