Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Entity Recognition Datasets | 1,386 | a year ago | 7 | mit | Python | |||||
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types. | ||||||||||
Ud_russian Syntagrus | 77 | a year ago | 16 | other | Perl | |||||
Russian data from the SynTagRus corpus. | ||||||||||
Gum | 76 | a year ago | 6 | other | Python | |||||
Repository for the Georgetown University Multilayer Corpus (GUM) | ||||||||||
Kwdlc | 71 | 10 months ago | 12 | Python | ||||||
Kyoto University Web Document Leads Corpus | ||||||||||
Annis | 67 | 4 | 4 | 9 months ago | 45 | February 03, 2023 | 44 | apache-2.0 | Java | |
ANNIS is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with diverse types of annotation. | ||||||||||
Quasar | 64 | 7 years ago | 1 | bsd-2-clause | Python | |||||
Datasets for Question Answering by Search and Reading | ||||||||||
Folia | 60 | 2 | 2 | a year ago | 93 | October 08, 2021 | 21 | gpl-3.0 | Python | |
FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions | ||||||||||
Nested_named_entities | 60 | a year ago | Python | |||||||
Craft | 58 | 2 years ago | 1 | other | Clojure | |||||
Deft_corpus | 57 | 5 years ago | 6 | other | Python | |||||
The Definition Extraction From Text corpus and relevant formatting scripts |