Openconvert

Text conversion tool (from e.g. Word, HTML, txt) to corpus formats TEI or FoLiA)
Alternatives To Openconvert
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Readtext112543 months ago10June 03, 202330R
an R package for reading text files
Gum76
6 months ago6otherPython
Repository for the Georgetown University Multilayer Corpus (GUM)
Eventstoryline70
7 months ago3otherDM
Event StoryLine Corpus - annotated data, baselines and evaluation scripts, evaluation data.
Folia60229 months ago93October 08, 202121gpl-3.0Python
FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions
Craft58
2 years ago1otherClojure
Deft_corpus57
4 years ago6otherPython
The Definition Extraction From Text corpus and relevant formatting scripts
Ronec54
a year agomitPython
Romanian Named Entity Corpus (RONEC) version 2.0
Broad_twitter_corpus52
2 years ago9otherJupyter Notebook
The Broad Twitter Corpus, an NER dataset in English stratified for time, location, social media genre, socioeconomic factors
Morphorueval 201741
6 years ago13otherPython
Discoursegraphs34113 years ago18March 14, 202146bsd-3-clausePython
linguistic converter / merging tool for multi-level annotated corpora. graph-based (using Python and NetworkX).
Alternatives To Openconvert
Select To Compare


Alternative Project Comparisons
Popular Format Projects
Popular Corpus Projects
Popular Text Processing Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Java
Format
Archive
Corpus
Txt