Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for corpus corpora
corpora
x
corpus
x
34 search results found
Entity Recognition Datasets
⭐
1,386
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
Gensim Data
⭐
492
Data repository for pretrained NLP models and NLP corpora.
Indicnlp_catalog
⭐
487
A collaborative catalog of NLP resources for Indic languages
Fuzzdata
⭐
486
Fuzzing resources for feeding various fuzzers with input. 🔧
Corus
⭐
254
Links to Russian corpora + Python functions for loading and parsing
Open Korean Corpora
⭐
117
Open Korean NLP Dataset Curation for the Users All Around the Globe
Self_dialogue_corpus
⭐
86
The Self-dialogue Corpus - a collection of self-dialogues across music, movies and sports
Corporaexplorer
⭐
62
An R package for dynamic exploration of text collections
Kontext
⭐
58
An advanced, extensible web front-end for the Manatee-open corpus search engine
Huner
⭐
45
Named Entity Recognition for biomedical entities
Arabic News Article Classification
⭐
43
Automatic categorization of documents, consists in assigning a category to a text based on the information it contains. We'll follow different approach of Supervised Machine Learning.
Parallel Corpora Tools
⭐
39
Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.
Corpusloaders.jl
⭐
31
A variety of loaders for various NLP corpora.
Awesome Cantonese Nlp
⭐
29
A curated list of resources dedicated to Natural Language Processing (NLP) of Cantonese | 粵語 NLP
Crossner
⭐
29
CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)
Corpora
⭐
16
repo for Tibetan corpora
Textstelle
⭐
14
Textstelle is a collection of corpora for the creation of bots and other things that generate text 🤖
Opus Api
⭐
14
OPUS (opus.nlpl.eu) Python3 API
Lyrics Corpora
⭐
13
An unofficial Python API that allows users to create a corpus of lyrical text from their favorite artists and billboard charts
Demeuk
⭐
13
Demeuk is a simple tool to clean up corpora (like dictionaries) or any dataset containing plain text strings.
Opiec
⭐
12
Reading the data from OPIEC - an Open Information Extraction corpus
Biomedical_corpora
⭐
12
Table compiling the list of biomedically-related corpora available for named entity recognition (and some also suitable for association detection). This has been published as part of the paper: Dieter Galea, Ivan Laponogov, Kirill Veselkov; Exploiting and assessing multi-source data for supervised biomedical named entity recognition, Bioinformatics, bty152, https://doi.org/10.1093/bioinformatics/bty152 . If you would like to add other (or your) corpora, please submit a pull request and I'll hap
Text Mining
⭐
12
Clean corpus generic script made with tm package
Ccrawl
⭐
11
Simple CORPORA list crawler
Wiki Dump Reader
⭐
10
Extract corpora from Wikipedia dumps
Potts
⭐
9
The Potsdam Twitter Sentiment Corpus
Corpus_similarity
⭐
9
Measure the similarity of text corpora for 74 languages
Habeas Corpus
⭐
8
Command-line corpus tools
Hawaiian Corpus
⭐
7
Data from a corpus of written Hawaiian
Awesome Swedish Nlp
⭐
6
A curated list of resources for natural language processing (NLP) in Swedish
Common_crawl_corpus
⭐
6
Scripts for building a geo-located web corpus using Common Crawl data
Brat Peek
⭐
5
Framework for working with brat-annotated .ann files
Lingcorpora.py
⭐
5
API for corpora
Frequency Count Benchmark
⭐
5
Benchmarking various tools for counting word and phrase frequency in corpora [for windows]
Related Searches
Python Corpus (2,447)
Natural Language Processing Corpus (636)
Jupyter Notebook Corpus (476)
Dataset Corpus (342)
Java Corpus (308)
Language Corpus (261)
1-34 of 34 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.