Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Hazm | 1,112 | 17 | 13 | 6 days ago | 20 | October 01, 2023 | 12 | mit | Python | |
Persian NLP Toolkit | ||||||||||
Ekphrasis | 583 | 7 | 2 years ago | 54 | May 17, 2022 | 18 | mit | Python | ||
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets). | ||||||||||
Open Korean Text | 552 | 6 | 6 | a year ago | 14 | August 07, 2018 | 13 | apache-2.0 | Scala | |
Open Korean Text Processor - An Open-source Korean Text Processor | ||||||||||
Konoha | 200 | 1 | 3 months ago | 10 | August 03, 2022 | mit | Python | |||
🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code. | ||||||||||
Prenlp | 105 | 4 years ago | 15 | August 15, 2020 | apache-2.0 | Python | ||||
Preprocessing Library for Natural Language Processing | ||||||||||
Textcluster | 60 | 4 years ago | bsd-3-clause | Python | ||||||
短文本聚类预处理模块 Short text cluster | ||||||||||
Tif | 35 | 5 months ago | 5 | R | ||||||
Text Interchange Formats | ||||||||||
Text Classification Lstms Pytorch | 31 | 2 years ago | 2 | Python | ||||||
The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle. | ||||||||||
Python Ucto | 29 | 2 | 1 | 6 months ago | 22 | October 31, 2023 | 5 | Cython | ||
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto). | ||||||||||
Python Mecab | 27 | 3 years ago | 6 | December 31, 2019 | 10 | bsd-3-clause | C++ | |||
A repository to bind mecab for Python 3.5+. Not using swig nor pybind. (Not Maintained Now) |