Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Nltk | 12,699 | 10,496 | 2,261 | 4 months ago | 59 | July 20, 2023 | 268 | apache-2.0 | Python | |
NLTK Source | ||||||||||
Vespa | 5,115 | 5 | 58 | 4 months ago | 741 | November 30, 2023 | 175 | apache-2.0 | Java | |
AI + Data, online. https://vespa.ai | ||||||||||
Insuranceqa Corpus Zh | 996 | 6 months ago | 11 | November 15, 2023 | 9 | other | Python | |||
:helicopter: 保险行业语料库,聊天机器人 | ||||||||||
Magpie | 574 | 4 years ago | 47 | mit | Python | |||||
Deep neural network framework for multi-label text classification | ||||||||||
Comet | 346 | 4 | 4 months ago | 32 | October 23, 2023 | 19 | apache-2.0 | Python | ||
A Neural Framework for MT Evaluation | ||||||||||
Nlvr | 249 | 2 years ago | HTML | |||||||
Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence. | ||||||||||
Fakenewscorpus | 184 | 4 years ago | 2 | apache-2.0 | ||||||
A dataset of millions of news articles scraped from a curated list of data sources. | ||||||||||
Pubmed Rct | 166 | 6 months ago | ||||||||
PubMed 200k RCT dataset: a large dataset for sequential sentence classification. | ||||||||||
Wp2txt | 160 | 1 | a year ago | 29 | May 13, 2023 | 1 | mit | Ruby | ||
A command-line toolkit to extract text content and category data from Wikipedia dump files | ||||||||||
Pre Modern_chinese_corpus_dataset | 132 | 9 months ago | HTML | |||||||
近代汉语语料库数据集 自然语言处理 语料库 古代汉语 古汉语 文言文 数字人文 计算语言 |