Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Entity Recognition Datasets | 1,386 | 7 months ago | 7 | mit | Python | |||||
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types. | ||||||||||
Propbank Release | 112 | 2 years ago | 11 | cc-by-sa-4.0 | ||||||
The official released annotations, both in .prop pointer format and as conll files. Does not contain the source texts | ||||||||||
Tutorialbank | 85 | a year ago | HTML | |||||||
Ud_russian Syntagrus | 77 | 6 months ago | 16 | other | Perl | |||||
Russian data from the SynTagRus corpus. | ||||||||||
Gum | 76 | 5 months ago | 6 | other | Python | |||||
Repository for the Georgetown University Multilayer Corpus (GUM) | ||||||||||
Kwdlc | 71 | 4 months ago | 12 | Python | ||||||
Kyoto University Web Document Leads Corpus | ||||||||||
Annis | 67 | 4 | 4 | 3 months ago | 45 | February 03, 2023 | 44 | apache-2.0 | Java | |
ANNIS is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with diverse types of annotation. | ||||||||||
Quasar | 64 | 6 years ago | 1 | bsd-2-clause | Python | |||||
Datasets for Question Answering by Search and Reading | ||||||||||
Nested_named_entities | 60 | 8 months ago | Python | |||||||
Folia | 60 | 2 | 2 | 9 months ago | 93 | October 08, 2021 | 21 | gpl-3.0 | Python | |
FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions |