Project Name	Stars	Repos Using This	Packages Using This	Most Recent Commit	Total Releases	Latest Release	Open Issues	License	Language
Sentencepiece	8,851	120	787	3 months ago	34	May 02, 2023	32	apache-2.0	C++
Unsupervised text tokenizer for Neural Network-based text generation.
Pkuseg Python	6,001	4	8	a year ago	22	June 19, 2020	119	mit	Python
pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation
Subword Nmt	1,937	18	18	2 years ago	8	December 08, 2021	2	mit	Python
Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
Pythainlp	902	24	51	3 months ago	101	November 26, 2023	35	apache-2.0	Python
Thai Natural Language Processing in Python.
Jieba Rs	585	5	15	9 months ago	40	July 16, 2023	9	mit	Rust
The Jieba Chinese Word Segmentation Implemented in Rust
Ekphrasis	583	7		2 years ago	54	May 17, 2022	18	mit	Python
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Vncorenlp	472			a year ago				other	Java
A Vietnamese natural language processing toolkit (NAACL 2018)
Nagisa	365	1	7	3 months ago	22	July 30, 2023	4	mit	Python
A Japanese tokenizer based on recurrent neural networks
Pycantonese	290			a year ago	24	December 28, 2021	5	mit	Python
Cantonese Linguistics and NLP
Python Wordsegment	268			4 years ago			8	other	Python
English word segmentation, written in pure-Python, and based on a trillion-word corpus.

Alternatives To Word_segmentation

Select To Compare

Sentencepiece ⭐ 8,851

Unsupervised text tokenizer for Neural Network-based text generation.

dependent packages 787total releases 34most recent commit 3 months ago

Pkuseg Python ⭐ 6,001

pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

dependent packages 8total releases 22most recent commit a year ago

Subword Nmt ⭐ 1,937

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation

dependent packages 18total releases 8most recent commit 2 years ago

Pythainlp ⭐ 902

Thai Natural Language Processing in Python.

dependent packages 51total releases 101most recent commit 3 months ago

Jieba Rs ⭐ 585

The Jieba Chinese Word Segmentation Implemented in Rust

dependent packages 15total releases 40most recent commit 9 months ago

Ekphrasis ⭐ 583

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

total releases 54most recent commit 2 years ago

Vncorenlp ⭐ 472

A Vietnamese natural language processing toolkit (NAACL 2018)

most recent commit a year ago

Nagisa ⭐ 365

A Japanese tokenizer based on recurrent neural networks

dependent packages 7total releases 22most recent commit 3 months ago

Pycantonese ⭐ 290

Cantonese Linguistics and NLP

total releases 24most recent commit a year ago

Python Wordsegment ⭐ 268

English word segmentation, written in pure-Python, and based on a trillion-word corpus.

most recent commit 4 years ago

Suggest An Alternative To word_segmentation

Alternative Project Comparisons

Word_segmentation vs Sentencepiece

Word_segmentation vs Pkuseg Python

Word_segmentation vs Subword Nmt

Word_segmentation vs Pythainlp

Word_segmentation vs Jieba Rs

Word_segmentation vs Ekphrasis

Word_segmentation vs Vncorenlp

Word_segmentation vs Nagisa

Word_segmentation vs Pycantonese

Word_segmentation vs Python Wordsegment

Popular Segmentation Projects

Jieba ⭐ 31,881

结巴中文分词

dependent packages 419total releases 32latest release January 20, 2020most recent commit 3 months ago

Deep Learning For Image Processing ⭐ 18,759

deep learning for image processing including classification and object-detection etc.

most recent commit 5 months ago

Imgaug ⭐ 13,682

Image augmentation for machine learning experiments.

dependent packages 141total releases 11latest release February 05, 2020most recent commit 9 months ago

Albumentations ⭐ 13,417

Fast image augmentation library and an easy-to-use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about the library: https://www.mdpi.com/2078-2489/11/2/125

dependent packages 273total releases 53latest release June 10, 2023most recent commit a day ago

Paddledetection ⭐ 11,653

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.

dependent packages 1total releases 9latest release September 19, 2022most recent commit 4 months ago

Popular Word Segmentation Projects

Lac ⭐ 3,644

百度NLP：分词，词性标注，命名实体识别，词重要性

dependent packages 12total releases 15latest release May 25, 2021most recent commit 3 years ago

Symspell ⭐ 3,022

SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

dependent packages 3total releases 8latest release February 11, 2022most recent commit a month ago

Youtokentome ⭐ 943

Unsupervised text tokenizer focused on computational efficiency

dependent packages 17total releases 14latest release February 12, 2020most recent commit a month ago

Fasthan ⭐ 730

fastHan是基于fastNLP与pytorch实现的中文自然语言处理工具，像spacy一样调用方

total releases 9latest release November 11, 2023most recent commit 5 months ago

Symspellpy ⭐ 693

Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

dependent packages 25total releases 22latest release October 24, 2022most recent commit 10 months ago

Popular Machine Learning Categories

Natural Language Processing

Neural Network

Neural

Computer Vision

Convolutional Neural Networks

Opencv