[RubyML | RubyDataScience | RubyInterop]
Useful resources for text processing in Ruby
This curated list comprises awesome resources, libraries, information sources about computational processing of texts in human languages with the Ruby programming language. That field is often referred to as NLP, Computational Linguistics, HLT (Human Language Technology) and can be brought in conjunction with Artificial Intelligence, Machine Learning, Information Retrieval, Text Mining, Knowledge Extraction and other related disciplines.
This list comes from our day to day work on Language Models and NLP Tools. Read why this list is awesome. Our FAQ describes the important decisions and useful answers you may be interested in.
✨ Every contribution is welcome! Add links through pull requests or create an issue to start a discussion.
Follow us on Twitter
and please spread the word using the #RubyNLP
hash tag!
Please help us to fill out this section! 😃
An NLP Pipeline starts with a plain text.
Language Identification is one of the first crucial steps in every NLP Pipeline.
Tools for Tokenization, Word and Sentence Boundary Detection and Disambiguation.
Stemming is the term used in information retrieval to describe the process for
reducing wordforms to some base representation. Stemming should be distinguished
from Lemmatization since stems
are not necessarily have
linguistic motivation.
Lemmatization is considered a process of finding a base form of a word. Lemmas are often collected in dictionaries.
String
and Hash
objects.Machine Learning Algorithms in pure Ruby or written in other programming languages with appropriate bindings for Ruby.
For more up-to-date list please look at the Awesome ML with Ruby list.
Please refer to the Data Visualization section on the Data Science with Ruby list.
Libraries for language aware string manipulation, i.e. search, pattern matching, case conversion, transcoding, regular expressions which need information about the underlying language.
ActiveSupport
gem has various string extensions that can handle case.All projects in this section are really important for the community but need more attention. Please if you have spare time and dedication spend some hours on the code here.
Awesome NLP with Ruby
by Andrei Beliankou and
Contributors.
To the extent possible under law, the person who associated CC0 with
Awesome NLP with Ruby
has waived all copyright and related or neighboring rights
to Awesome NLP with Ruby
.
You should have received a copy of the CC0 legalcode along with this work. If not, see https://creativecommons.org/publicdomain/zero/1.0/.