Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Sentencepiece | 8,851 | 120 | 787 | 3 months ago | 34 | May 02, 2023 | 32 | apache-2.0 | C++ | |
Unsupervised text tokenizer for Neural Network-based text generation. | ||||||||||
Tokenizers | 8,056 | 362 | 3 months ago | 85 | November 14, 2023 | 233 | apache-2.0 | Rust | ||
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production | ||||||||||
Gpt2 Chinese | 7,249 | 4 months ago | 105 | mit | Python | |||||
Chinese version of GPT2 training code, using BERT tokenizer. | ||||||||||
Hazm | 1,102 | 17 | 13 | 8 days ago | 20 | October 01, 2023 | 12 | mit | Python | |
Persian NLP Toolkit | ||||||||||
Natasha | 1,085 | 3 | 9 | 7 months ago | 19 | July 24, 2023 | 24 | mit | Python | |
Solves basic Russian NLP tasks, API for lower level Natasha projects | ||||||||||
Kobert | 1,035 | a year ago | 5 | apache-2.0 | Jupyter Notebook | |||||
Korean BERT pre-trained cased (KoBERT) | ||||||||||
Nlp With Ruby | 1,002 | 10 months ago | 5 | cc0-1.0 | Ruby | |||||
Curated List: Practical Natural Language Processing done in Ruby | ||||||||||
Soynlp | 801 | 4 | 9 | 2 years ago | 33 | August 25, 2019 | 54 | other | Python | |
한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다. | ||||||||||
Ekphrasis | 583 | 7 | 2 years ago | 54 | May 17, 2022 | 18 | mit | Python | ||
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets). | ||||||||||
Open Korean Text | 552 | 6 | 6 | a year ago | 14 | August 07, 2018 | 13 | apache-2.0 | Scala | |
Open Korean Text Processor - An Open-source Korean Text Processor |