Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Fastnlp | 2,850 | 1 | 2 | 25 days ago | 10 | February 04, 2019 | 59 | apache-2.0 | Python | |
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation. | ||||||||||
Text_classification | 1,621 | 5 months ago | 1 | mit | Python | |||||
Text Classification Algorithms: A Survey | ||||||||||
Lingua Go | 862 | 2 | 17 days ago | 8 | December 28, 2021 | 5 | apache-2.0 | Go | ||
The most accurate natural language detection library for Go, suitable for long and short text alike | ||||||||||
Ekphrasis | 583 | 7 | 6 months ago | 54 | May 17, 2022 | 18 | mit | Python | ||
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets). | ||||||||||
Whatlanggo | 580 | 4 | 21 | 5 days ago | 2 | March 06, 2019 | 12 | mit | Go | |
Natural language detection library for Go | ||||||||||
Open Korean Text | 552 | 6 | 6 | 12 days ago | 14 | August 07, 2018 | 13 | apache-2.0 | Scala | |
Open Korean Text Processor - An Open-source Korean Text Processor | ||||||||||
Pynlpl | 406 | 16 | 3 | 4 years ago | 102 | March 13, 2019 | 2 | gpl-3.0 | Python | |
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation). | ||||||||||
Pykospacing | 305 | 2 months ago | 2 | gpl-3.0 | Python | |||||
Automatic Korean word spacing with Python | ||||||||||
Textpipe | 290 | 1 | 2 years ago | 39 | January 25, 2021 | 24 | mit | Python | ||
Textpipe: clean and extract metadata from text | ||||||||||
Stringi | 263 | 2 months ago | 46 | other | C++ | |||||
Fast and portable character string processing in R (with the Unicode ICU) |
Natural language detection for Go.
Installation:
go get -u github.com/abadojack/whatlanggo
Simple usage example:
package main
import (
"fmt"
"github.com/abadojack/whatlanggo"
)
func main() {
info := whatlanggo.Detect("Foje funkcias kaj foje ne funkcias")
fmt.Println("Language:", info.Lang.String(), " Script:", whatlanggo.Scripts[info.Script], " Confidence: ", info.Confidence)
}
package main
import (
"fmt"
"github.com/abadojack/whatlanggo"
)
func main() {
//Blacklist
options := whatlanggo.Options{
Blacklist: map[whatlanggo.Lang]bool{
whatlanggo.Ydd: true,
},
}
info := whatlanggo.DetectWithOptions("האקדמיה ללשון העברית", options)
fmt.Println("Language:", info.Lang.String(), "Script:", whatlanggo.Scripts[info.Script])
//Whitelist
options1 := whatlanggo.Options{
Whitelist: map[whatlanggo.Lang]bool{
whatlanggo.Epo: true,
whatlanggo.Ukr: true,
},
}
info = whatlanggo.DetectWithOptions("Mi ne scias", options1)
fmt.Println("Language:", info.Lang.String(), " Script:", whatlanggo.Scripts[info.Script])
}
For more details, please check the documentation.
Go 1.8 or higher
The algorithm is based on the trigram language models, which is a particular case of n-grams. To understand the idea, please check the original whitepaper Cavnar and Trenkle '94: N-Gram-Based Text Categorization'.
It is based on the following factors:
rate
in the code base.Therefore, it can be presented as 2d space with threshold functions, that splits it into "Reliable" and "Not reliable" areas. This function is a hyperbola and it looks like the following one:
For more details, please check a blog article Introduction to Rust Whatlang Library and Natural Language Identification Algorithms.
whatlanggo is a derivative of Franc (JavaScript, MIT) by Titus Wormer.
Thanks to greyblake (Potapov Sergey) for creating whatlang-rs from where I got the idea and algorithms.