Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Sejong Corpus | 103 | 5 years ago | 1 | other | Shell | |||||
Korean sejong corpus download and simple analysis | ||||||||||
Gum | 76 | 6 months ago | 6 | other | Python | |||||
Repository for the Georgetown University Multilayer Corpus (GUM) | ||||||||||
Folia | 60 | 2 | 2 | 9 months ago | 93 | October 08, 2021 | 21 | gpl-3.0 | Python | |
FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions | ||||||||||
Allofplos | 53 | 8 months ago | 21 | December 06, 2022 | 35 | mit | Python | |||
Repository for the allofplos project. | ||||||||||
Opencorpora Tools | 42 | 6 | 4 years ago | 9 | October 11, 2020 | 2 | mit | Python | ||
Python interface to http://opencorpora.org/ | ||||||||||
Corpora | 26 | 7 months ago | 10 | CSS | ||||||
Public repository for Coptic SCRIPTORIUM Corpora Releases | ||||||||||
Ticcl | 19 | 10 months ago | 2 | gpl-3.0 | Python | |||||
Text-Induced Corpus Clean-up | ||||||||||
Wikicorpusextractor | 19 | 10 years ago | Python | |||||||
Extracts text from WikiMedia XML Dump files | ||||||||||
Textbox | 18 | 2 years ago | 4 | |||||||
Text collections made available by the CLiGS group. | ||||||||||
Tota | 16 | 7 years ago | 1 | other | ||||||
Texts of Trade Agreements Corpus |