Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Sentencepiece | 8,851 | 120 | 787 | 2 months ago | 34 | May 02, 2023 | 32 | apache-2.0 | C++ | |
Unsupervised text tokenizer for Neural Network-based text generation. | ||||||||||
Youtokentome | 940 | 6 | 17 | 3 months ago | 14 | February 12, 2020 | 39 | mit | C++ | |
Unsupervised text tokenizer focused on computational efficiency | ||||||||||
Pythainlp | 902 | 24 | 51 | 2 months ago | 101 | November 26, 2023 | 35 | apache-2.0 | Python | |
Thai Natural Language Processing in Python. | ||||||||||
Jieba Rs | 585 | 5 | 15 | 8 months ago | 40 | July 16, 2023 | 9 | mit | Rust | |
The Jieba Chinese Word Segmentation Implemented in Rust | ||||||||||
Ekphrasis | 583 | 7 | a year ago | 54 | May 17, 2022 | 18 | mit | Python | ||
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets). | ||||||||||
M3tl | 544 | a year ago | 25 | apache-2.0 | Jupyter Notebook | |||||
BERT for Multitask Learning | ||||||||||
Vncorenlp | 472 | a year ago | other | Java | ||||||
A Vietnamese natural language processing toolkit (NAACL 2018) | ||||||||||
Ckip Transformers | 439 | a year ago | 1 | gpl-3.0 | Python | |||||
CKIP Transformers | ||||||||||
Nagisa | 365 | 1 | 7 | 2 months ago | 22 | July 30, 2023 | 4 | mit | Python | |
A Japanese tokenizer based on recurrent neural networks | ||||||||||
Jumanpp | 334 | a year ago | 30 | apache-2.0 | C++ | |||||
Juman++ (a Morphological Analyzer Toolkit) |