Vaporetto

🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer
Alternatives To Vaporetto
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Sentencepiece8,8511207873 months ago34May 02, 202332apache-2.0C++
Unsupervised text tokenizer for Neural Network-based text generation.
Tokenizers8,0563623 months ago85November 14, 2023233apache-2.0Rust
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Gpt2 Chinese7,249
3 months ago105mitPython
Chinese version of GPT2 training code, using BERT tokenizer.
Hazm1,09617132 days ago20October 01, 202312mitPython
Persian NLP Toolkit
Natasha1,085397 months ago19July 24, 202324mitPython
Solves basic Russian NLP tasks, API for lower level Natasha projects
Kobert1,035
a year ago5apache-2.0Jupyter Notebook
Korean BERT pre-trained cased (KoBERT)
Nlp With Ruby1,002
10 months ago5cc0-1.0Ruby
Curated List: Practical Natural Language Processing done in Ruby
Soynlp801492 years ago33August 25, 201954otherPython
한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다.
Ekphrasis583
72 years ago54May 17, 202218mitPython
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Open Korean Text55266a year ago14August 07, 201813apache-2.0Scala
Open Korean Text Processor - An Open-source Korean Text Processor
Alternatives To Vaporetto
Select To Compare


Alternative Project Comparisons
Popular Tokenizer Projects
Popular Natural Language Processing Projects
Popular Compilers Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Rust
Natural Language Processing
Segmentation
Japanese
Tokenizer
Morphological Analysis