Useful resources for text processing in Ruby
This curated list comprises awesome resources, libraries, information sources about computational processing of texts in human languages with the Ruby programming language. That field is often referred to as NLP, Computational Linguistics, HLT (Human Language Technology) and can be brought in conjunction with Artificial Intelligence, Machine Learning, Information Retrieval, Text Mining, Knowledge Extraction and other related disciplines.
✨ Every contribution is welcome! Add links through pull requests or create an issue to start a discussion.
Follow us on Twitter
and please spread the word using the
#RubyNLP hash tag!
Please help us to fill out this section! 😃
An NLP Pipeline starts with a plain text.
Language Identification is one of the first crucial steps in every NLP Pipeline.
Tools for Tokenization, Word and Sentence Boundary Detection and Disambiguation.
Stemming is the term used in information retrieval to describe the process for
reducing wordforms to some base representation. Stemming should be distinguished
from Lemmatization since
stems are not necessarily have
Lemmatization is considered a process of finding a base form of a word. Lemmas are often collected in dictionaries.
Machine Learning Algorithms in pure Ruby or written in other programming languages with appropriate bindings for Ruby.
For more up-to-date list please look at the Awesome ML with Ruby list.
Libraries for language aware string manipulation, i.e. search, pattern matching, case conversion, transcoding, regular expressions which need information about the underlying language.
ActiveSupportgem has various string extensions that can handle case.
All projects in this section are really important for the community but need more attention. Please if you have spare time and dedication spend some hours on the code here.
To the extent possible under law, the person who associated CC0 with
Awesome NLP with Ruby has waived all copyright and related or neighboring rights
Awesome NLP with Ruby.
You should have received a copy of the CC0 legalcode along with this work. If not, see https://creativecommons.org/publicdomain/zero/1.0/.