Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for unicode tokenizer
tokenizer
x
unicode
x
19 search results found
Text
⭐
1,104
Making text a first-class citizen in TensorFlow.
Jflex
⭐
523
The fast scanner generator for Java™ with full Unicode support
Tokenizer
⭐
207
Fast and customizable text tokenization library with BPE and SentencePiece support
Nlpt
⭐
35
Natural Language Processing Toolkit written in Go (DEPRECATED see individual packages prefixed nlpt-)
Uax29
⭐
34
A tokenizer based on Unicode text segmentation (UAX #29), for Go. Split words, sentences and graphemes.
Sqlite3 Unicodesn
⭐
29
SQLite unicode full-text-search tokenizer with Snowball stemming
Unicode Tokenizer
⭐
20
Unicode Tokenizer following the Unicode Line Breaking algorithm
Pyidaungsu
⭐
19
Python library for Myanmar language
Greeb
⭐
16
Greeb is a simple Unicode-aware regexp-based tokenizer.
Tokenizer
⭐
11
Natural Language Tokenizer
Mascara
⭐
6
A natural language tokenizer
Cjk Tokenizer
⭐
5
Cjk Mmseg
⭐
3
An implementation of Chinese / CJK MMSEG tokenizer algorithm with Sogou Dictionary. MMSEG 中文分词工具实现。
Overview Js Tokenizer
⭐
3
Splits text into tokens in as many languages as possible
Nlp Project Hindi Language Tokenizer
⭐
2
This package tends to implement a Tokenizer and a stemmer for Hindi language.
Cjk Tokenzier
⭐
2
A unigram CJK tokenizer
Nlpt Tkz
⭐
1
natural language tokenization using various types of tokenizer, including a state function lexer, unicode matcher, and basic white space splitter
Elasticsearch Analysis Customwordboundarystandard
⭐
1
An extension to Lucene's Standard Tokenizer with custom word boundaries
Unipy
⭐
1
Unicode operators preprocessing for Python.
Related Searches
Character Unicode (849)
Python Unicode (836)
Javascript Unicode (740)
Python Tokenizer (341)
C Plus Plus Unicode (337)
1-19 of 19 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2023 Awesome Open Source. All rights reserved.