Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Nlp_chinese_corpus | 8,344 | a year ago | 20 | mit | ||||||
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP | ||||||||||
Awesome Pretrained Chinese Nlp Models | 3,738 | 5 months ago | 1 | mit | Python | |||||
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合 | ||||||||||
Chinese Names Corpus | 3,719 | 6 months ago | 7 | apache-2.0 | ||||||
中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。 | ||||||||||
Clue | 3,345 | a year ago | 73 | Python | ||||||
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard | ||||||||||
Textrecognitiondatagenerator | 2,901 | 6 months ago | 12 | August 02, 2022 | 134 | mit | Python | |||
A synthetic data generator for text recognition | ||||||||||
Cluedatasetsearch | 2,778 | 2 years ago | 6 | Python | ||||||
搜索所有中文NLP数据集,附常用英文NLP数据集 | ||||||||||
Awesome_chinese_medical_nlp | 1,847 | 5 months ago | 1 | |||||||
中文医学NLP公开资源整理:术语集/语料库/词向量/预训练模型/知识图谱/命名实体识别/QA/信息抽取/模型/论文/etc | ||||||||||
Cluener2020 | 1,416 | 2 years ago | 48 | Python | ||||||
CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition | ||||||||||
Chinesenlp | 1,329 | 3 years ago | 3 | HTML | ||||||
Datasets, SOTA results of every fields of Chinese NLP | ||||||||||
Data Juicer | 994 | 5 months ago | 3 | September 28, 2023 | 16 | apache-2.0 | Python | |||
A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据! |