Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Text | 1,172 | 2 | 134 | 3 months ago | 67 | November 15, 2023 | 160 | apache-2.0 | C++ | |
Making text a first-class citizen in TensorFlow. | ||||||||||
Jflex | 523 | 194 | 64 | a year ago | 14 | March 11, 2023 | 25 | other | Java | |
The fast scanner generator for Java™ with full Unicode support | ||||||||||
Tokenizer | 224 | 15 | 5 | 9 months ago | 68 | January 11, 2023 | 2 | mit | C++ | |
Fast and customizable text tokenization library with BPE and SentencePiece support | ||||||||||
Uax29 | 35 | 6 | 6 months ago | 40 | May 26, 2023 | 1 | mit | Go | ||
A tokenizer based on Unicode text segmentation (UAX #29), for Go. Split words, sentences and graphemes. | ||||||||||
Nlpt | 35 | 8 years ago | 1 | March 05, 2016 | other | Go | ||||
Natural Language Processing Toolkit written in Go (DEPRECATED see individual packages prefixed nlpt-) | ||||||||||
Sqlite3 Unicodesn | 29 | 5 years ago | 5 | C | ||||||
SQLite unicode full-text-search tokenizer with Snowball stemming | ||||||||||
Unicode Tokenizer | 20 | 1 | 4 | 11 years ago | 5 | September 15, 2012 | JavaScript | |||
Unicode Tokenizer following the Unicode Line Breaking algorithm | ||||||||||
Pyidaungsu | 19 | a year ago | mit | Python | ||||||
Python library for Myanmar language | ||||||||||
Greeb | 16 | 2 | 5 years ago | 26 | January 14, 2015 | mit | Ruby | |||
Greeb is a simple Unicode-aware regexp-based tokenizer. | ||||||||||
Tokenizer | 11 | 5 years ago | 1 | November 28, 2018 | apache-2.0 | Go | ||||
Natural Language Tokenizer |