Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for corpus text processing
corpus
x
text-processing
x
14 search results found
Ekphrasis
⭐
583
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Pykospacing
⭐
348
Automatic Korean word spacing with Python
Colibri Core
⭐
122
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
Awesome Legal Data
⭐
45
Collection of Datasets for Legal Text Processing
Tif
⭐
35
Text Interchange Formats
Gomtch
⭐
26
Find text even if it doesn't want to be found
Mytwitterbot
⭐
20
A Twitter bot powered by a Recurrent Neural Network (RNN)
Textstelle
⭐
14
Textstelle is a collection of corpora for the creation of bots and other things that generate text 🤖
Russian_subtitles_dataset
⭐
9
Preprocessing of the dataset of 347 subtitles for the TV series (thanks to Taiga Corpus) to build a word2vec model, JamSpell model, neural network training, chat bot training or in any other NLP task.
Opiec Pipeline
⭐
7
Text Proc
⭐
5
Scripts for Text Processing
Iestac
⭐
5
A corpus that can be used to train English-to-Italian End-to-End Speech-to-Text Machine Translation models
Information Retrieval
⭐
5
Textual Information Retrieval (IR) and Information Extraction (IE) engine
Yeezy Taught Me
⭐
5
Yeezy Taught Me Text Generation. Training next character predictions RNN LSTM model with user input text corpus
Related Searches
Python Corpus (2,447)
Natural Language Processing Corpus (510)
Dataset Corpus (342)
Java Corpus (308)
Language Corpus (261)
1-14 of 14 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.