Tokenizer

Fast and customizable text tokenization library with BPE and SentencePiece support
Alternatives To Tokenizer
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Sense2vec1,48667a year ago24April 19, 202120mitPython
🦆 Contextually-keyed word vectors
Stringi292
4 months ago42otherC++
Fast and portable character string processing in R (with the Unicode ICU)
Tokenizer2241559 months ago68January 11, 20232mitC++
Fast and customizable text tokenization library with BPE and SentencePiece support
Rouge 2.0145
4 years ago1apache-2.0Java
ROUGE automatic summarization evaluation toolkit. Support for ROUGE-[N, L, S, SU], stemming and stopwords in different languages, unicode text evaluation, CSV output.
Guide To Swift Strings Sample Code124
5 years ago1Swift
Xcode Playground Sample Code for the Flight School Guide to Swift Strings
Unihandecode71
2 years ago17July 23, 20201gpl-3.0Python
unihandecode is a transliteration library to convert all characters/words in Unicode into ASCII alphabet that aware with Language preference priorities
Bengali Alphabet51
5 months ago1mitJavaScript
✍️ Bengali alphabet (বাংলা বর্ণমালা)
Uax293566 months ago40May 26, 20231mitGo
A tokenizer based on Unicode text segmentation (UAX #29), for Go. Split words, sentences and graphemes.
Stringx25
5 months ago9otherHTML
Drop-in replacements for base R string functions powered by stringi
Urdu Characters18
3 years agomitPython
📄 Complete collection of Urdu language characters & unicode code points.
Alternatives To Tokenizer
Select To Compare


Alternative Project Comparisons
Popular Natural Language Processing Projects
Popular Unicode Projects
Popular Machine Learning Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
C Plus Plus
Natural Language Processing
Unicode
Tokenizer
Machine Translation
Icu