Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for segmentation tokenization
segmentation
x
tokenization
x
10 search results found
Ekphrasis
⭐
583
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Vibrato
⭐
275
🎤 vibrato: Viterbi-based accelerated tokenizer
Sudachi.rs
⭐
253
Sudachi in Rust 🦀 and new generation of SudachiPy
Razdel
⭐
226
Rule-based token, sentence segmentation for Russian language
Vaporetto
⭐
206
🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer
Tkseem
⭐
49
Arabic Tokenization Library. It provides many tokenization algorithms.
Wongnai Corpus
⭐
47
Collection of Wongnai's datasets
Uax29
⭐
35
A tokenizer based on Unicode text segmentation (UAX #29), for Go. Split words, sentences and graphemes.
Python Vaporetto
⭐
17
🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.
Tokenization Scorer
⭐
10
Simple-to-use scoring function for arbitrarily tokenized texts.
Related Searches
Python Segmentation (4,468)
Jupyter Notebook Segmentation (1,312)
Deep Learning Segmentation (1,194)
Dataset Segmentation (831)
Pytorch Segmentation (796)
C Plus Plus Segmentation (726)
1-10 of 10 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.