Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for dataset chinese
chinese
x
dataset
x
116 search results found
Nlp_chinese_corpus
⭐
8,344
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Awesome Pretrained Chinese Nlp Models
⭐
3,738
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
Chinese Names Corpus
⭐
3,719
中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词
Clue
⭐
3,345
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Textrecognitiondatagenerator
⭐
2,901
A synthetic data generator for text recognition
Cluedatasetsearch
⭐
2,778
搜索所有中文NLP数据集,附常用英文NLP数据集
Awesome_chinese_medical_nlp
⭐
1,847
中文医学NLP公开资源整理:术语集/语料库/词向量/预训练模型/知识图谱/命名实体识别/QA/信息抽
Cluener2020
⭐
1,384
CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition
Chinesenlp
⭐
1,329
Datasets, SOTA results of every fields of Chinese NLP
Data Juicer
⭐
994
A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据!
Cdial Gpt
⭐
944
A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models
Synthtext_chinese_version
⭐
682
Modify from https://github.com/ankush-me/SynthText.git to generate chinese character
Knowledge Graph Learning
⭐
662
A curated list of awesome knowledge graph tutorials, projects and communities.
Ngram2vec
⭐
638
Four word embedding models implemented in Python. Supporting arbitrary context features
Cluepretrainedmodels
⭐
536
高质量中文预训练模型集合:最先进大模型、最快小模型、相似度专门模型
Fastbert
⭐
527
The score code of FastBERT (ACL2020)
Cluecorpus2020
⭐
517
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
Cblue
⭐
515
中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Mmsa
⭐
494
MMSA is a unified framework for Multimodal Sentiment Analysis.
Attention Ocr Chinese Version
⭐
394
Attention OCR Based On Tensorflow
Chinese Nlp Corpus
⭐
378
Collections of Chinese NLP corpus
Chinese_rumor_dataset
⭐
358
中文谣言数据
Cmrc2018
⭐
313
A Span-Extraction Dataset for Chinese Machine Reading Comprehension (CMRC 2018)
Multi Criteria Cws
⭐
260
Simple Solution for Multi-Criteria Chinese Word Segmentation
Chinese Literature Ner Re Dataset
⭐
229
A Discourse-Level Named Entity Recognition and Relation Extraction Dataset for Chinese Literature Text
Chinese_conversation_sentiment
⭐
190
A Chinese sentiment dataset may be useful for sentiment analysis.
Articlepairmatching
⭐
175
The code of ACL 2019 paper: Matching Article Pairs with Graphical Decomposition and Convolutions
Nlp Public Dataset
⭐
172
Chinese, English NER, English-Chinese machine translation dataset. 中英文实体识别数据集,中英文机器翻译数据集, 中文分词数据集
At4chinesener
⭐
160
Adversarial Transfer Learning for Chinese Named Entity Recognition with Self-Attention Mechanism
Tumcc
⭐
158
Telegram地下市场中文黑话识别语料集。Telegram Underground Market Chinese Corpus. Paper: Identification of Chinese Dark Jargons in Telegram Underground Markets Using Context-Oriented and Linguistic Features (IP&M, 2022).
Chinese_nre
⭐
150
Source code for ACL 2019 paper "Chinese Relation Extraction with Multi-Grained Information and External Linguistic Knowledge"
Chinese Speech To Text
⭐
144
Chinese Speech To Text Using Wavenet
Pre Modern_chinese_corpus_dataset
⭐
132
近代汉语语料库数据集 自然语言处理 语料库 古代汉语 古汉语 文言文 数字人文 计算语言
Glyph
⭐
127
Which Encoding is the Best for Text Classification in Chinese, English, Japanese and Korean?
C3
⭐
124
Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension
Chinese Rc Datasets
⭐
112
Collections of Chinese reading comprehension datasets
Covid Dialogue
⭐
110
Datasets
⭐
78
Poetry-related datasets developed by THUAIPoet (Jiuge) group.
Cihai
⭐
77
Python library for CJK (Chinese, Japanese, and Korean) language dictionary
Scut Ept_dataset_release
⭐
76
The SCUT-EPT Dataset for the research of offline handwritten Chinese text recognition (HCTR) in educational documents has been released.
Auto_cliwc
⭐
74
Code for Chinese LIWC Lexicon Expansion via Hierarchical Classification of Word Embeddings with Sememe Attention (AAAI18)
Chisp
⭐
72
scripts and baselines for CSpider: Chinese semantic parsing and text-to-SQL challenge
Traditional Chinese Handwriting Dataset
⭐
71
Open source traditional chinese handwriting dataset.
Audiocaption
⭐
64
Dataset and baseline for the first Audiocaption task
Awesome Nlp Chinese Corpus
⭐
59
A curated list of resources of chinese corpora for NLP(Natural Language Processing)
Agis Net
⭐
58
[SIGGRAPH Asia 2019] Artistic Glyph Image Synthesis via One-Stage Few-Shot Learning
Automatic Corpus Generation
⭐
53
This repository is for the paper "A Hybrid Approach to Automatic Corpus Generation for Chinese Spelling Check"
Synth_chinese_ocr_dataset
⭐
52
Synthesize image dataset for Chinese text recognition
Cmedqa2
⭐
51
This is updated version of the dataset for Chinese community medical question answering.
Chinesetrafficpolicepose
⭐
50
Detects Chinese traffic police commanding poses 检测中国交警指挥手势
Scut Hccdoc_dataset_release
⭐
50
Clts Dataset
⭐
49
A Chinese Long Text Summarization Dataset
Chineseqa With Bert
⭐
47
Final Project for EECS496-7
Awesome Chinese Llm
⭐
45
Awesome Chinese LLM: A curated list of Chinese Large Language Model 中文大语言模型数据集和模型资料汇总
Cluemotionanalysis2020
⭐
42
CLUE Emotion Analysis Dataset 细粒度情感分析数据集
Cnli
⭐
42
Baseline for the CNLI corpus
Odsqa
⭐
41
ODSQA: OPEN-DOMAIN SPOKEN QUESTION ANSWERING DATASET
Bert Ner
⭐
39
Using pre-trained BERT models for Chinese and English NER with 🤗Transformers
Pycasia
⭐
38
A python library to work with the CASIA Chinese handwriting database.
Yueye
⭐
38
Code for my blog:
Ckbqa
⭐
37
A Chinese KBQA dataset with SPARQL annotations.
Saner
⭐
37
Cmedqa
⭐
36
This is the dataset for Chinese community medical question answering.
Douban Dushu Dataset
⭐
36
A dataset contains 37 million douban dushu comments
Character Level Convolutional Network For Text Classification Applied To Chinese Corpus
⭐
34
Thesis of UCL student Weijie Huang
Chinese_ocr
⭐
33
yolo3 + densenet ocr
Opennre_for_chinese
⭐
33
OpenNRE for Chinese open relation extraction task in pytorch
Synthtext_chinese
⭐
32
Modify from https://github.com/JarveeLee/SynthText_Chinese_ver with python3 and cv3.
Clpr.pytorch
⭐
32
End to End Chinese License Plate Recognition
Iching
⭐
30
Iching is a deep meta reinforcement learning quantitative trading platform.
Lrebench
⭐
30
Code for the EMNLP2022 paper "Towards Realistic Low-resource Relation Extraction: A Benchmark with Empirical Baseline Study."
Llm Dataset Chinese Poetry
⭐
30
目标:整理一份高质量的大模型古诗词数据集,涵盖先秦到现代
Cnn Question Classification Keras
⭐
29
Chinese Question Classifier (Keras Implementation) on BQuLD
Naturallanguageprocessing
⭐
28
Natural Language Procesing
Asap
⭐
25
ASAP: A Chinese Review Dataset Towards Aspect Category Sentiment Analysis and Rating Prediction
Chinese Speech Emotion Datasets
⭐
23
Datasets of A Deep Convolutional Neural Network Based Virtual Elderly Companion Agent.
Analects
⭐
23
Public datasets on the Chinese language, accessible from Ruby
Al Ner
⭐
23
LTP: A New Active Learning Strategy for CRF-Based Named Entity Recognition
Nlp_pemdc
⭐
23
NLP Predtrained Embeddings, Models and Datasets Collections(NLP_PEMDC). The collection will keep updating.
Handwriting Chinese Characters Recognition
⭐
22
Applied Traditional-Chinese-Handwriting-Dataset to realize handwriting recognition by CNN model.
Cos960
⭐
21
COS960: A Chinese Word Similarity Dataset of 960 Word Pairs
Hrcenternet
⭐
19
HRCenterNet: An Anchorless Approach to Chinese Character Segmentation in Historical Documents
Chinese Landscape Painting Dataset
⭐
19
Dataset used for WACV 2021 paper: "End-to-End Chinese Landscape Painting Creation Using Generative Adversarial Networks"
Weiboscope Data
⭐
18
Download, extract and index Weiboscope data
Cmid
⭐
18
Chinese Medical Intent Dataset
Corpus_dataset_for_chinese_nlp
⭐
18
中文 NLP 语料库数据集
Icytranslate_offline
⭐
17
The offline part of icytranslate(a english-chinese translate platform) ,the output of this project should be a translate model
Chinese_im2text.pytorch
⭐
17
PyTorch implementation of Chinese image captioning on AI_challenger dataset
Stable Diffusion Chinese Extend
⭐
16
A fine tune version of Stable Diffusion model on self-translate 10k diffusiondb Chinese Corpus and "extend" it
Chinese Psychological Qa Dataset
⭐
16
中文心理问答数据集
Berserker
⭐
16
Berserker - BERt chineSE woRd toKenizER
Cemrclass
⭐
16
Traditional Chinese Medicine Clinical Records Classification. In BIBM 2016
Chinese Qasystem
⭐
16
Chinese question answering system based on BLSTM and CRF.
Chatbot Pytorch
⭐
14
A seq2seq based chatbot built with PyTorch (中文聊天机器人)
Pclue
⭐
14
pCLUE: 1200000+多任务提示学习数据集
Chinese Simile Recognition Dataset
⭐
14
A chinese simile recognition dataset of "Xiang".
Expression_of_emotions_in_20th_century_chinese_books
⭐
14
Sentiment Analysis on Google's Chinese 1gram dataset
Traffic Signals Detect Chinese
⭐
14
Train a model to detect Chinese traffic signs and signals with tensorflow object detection API
A2m_chinesenmt
⭐
13
Dataset for TALLIP2019 paper "Ancient-Modern Chinese Translation with a New Large Training Dataset"
Evalution
⭐
13
Dataset containing Semantic Relations and Metadata, for Training and Evaluating Distributional Semantic Models in English and Mandarin Chinese
Related Searches
Python Dataset (14,792)
Jupyter Notebook Dataset (6,824)
Deep Learning Dataset (2,364)
Machine Learning Dataset (2,279)
Python Chinese (1,892)
Dataset Pytorch (1,847)
Dataset Tensorflow (1,583)
Dataset Classification (1,500)
Dataset Convolutional Neural Networks (1,264)
Dataset Paper (1,252)
1-100 of 116 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.