Awesome Open Source
Awesome Open Source
Application Programming Interfaces
Command Line Interface
Integrated Development Environments
Lists Of Projects
User Interface Components
Web User Interface
The Top 53 Text Processing Open Source Projects
Command Line Text Processing
⚡️ From finding text to search and replace, from sorting to beautifying text and more 🎨
Become A Software Engineer At Top Companies
Identify your strengths with a free online coding quiz, and skip resume and recruiter screens at multiple companies at once. It's free, confidential, includes a free flight and hotel, along with help to study to pass interviews and negotiate a high salary!
Diff Match Patch
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
Intuitive find & replace CLI (sed alternative)
Text Classification Algorithms: A Survey
Python library for creating PEG parsers
Natural language detection library for Go
A simple Python module for parsing human names into their individual components
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Open Korean Text
Open Korean Text Processor - An Open-source Korean Text Processor
Simple SQL-like syntax on top of Perl text processing.
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
A fast implementation of Aho-Corasick in Rust.
🗣️ Tool to generate adversarial text examples and test machine learning models against them
Textpipe: clean and extract metadata from text
THE String Processing Package for R (with ICU)
A low level regular expression library that uses deterministic finite automata.
UNIC: Unicode and Internationalization Crates for Rust
Tool which allow you to detect and translate text.
Python library for Natural Language Preprocessing (NLPre)
Text vectorization tool to outperform TFIDF for classification tasks
Util collection for Japanese text processing. Hiraganize, Katakanize, and Romanize.
Extract indicators of compromise from text, including "escaped" ones.
A web app to create and browse text visualizations for automated customer listening.
Stanford NLP group's shared Python tools.
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku and Zenkaku
Text Mining and Topic Modeling Toolkit for Python with parallel processing power
Preprocessing Library for Natural Language Processing
🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
A Golang library for processing Asciidoc files.
Dan Jurafsky Chris Manning Nlp
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
CogComp's light-weight Python NLP annotators
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
短文本聚类预处理模块 Short text cluster
Binary Processing Language
Multi-lingual Text Processing
Vision Framework IOS WWDC 2017
A NodeJS implementation of the Rapid Automatic Keyword Extraction algorithm.
A flexible Java text processor. BB, BBCode, BB-code, HTML, Textile, Markdown, parser, translator, converter.
Unix Text Commands
Unix Text Processing Command Reference
Cleaning-up Persian Texts!
Nostril: Nonsense String Evaluator
Expands texts as you type, naturally
Go Search Replace
🚀 Search & replace URLs in WordPress SQL files.
Applied Text Mining In Python
Repo for Applied Text Mining in Python (coursera) by University of Michigan
Qp Trie Rs
An idiomatic and fast QP-trie implementation in pure Rust.
Mycroft's multilingual text parsing and formatting library
A large scale feature extraction tool for text-based machine learning
Concise Ipython Notebooks For Deep Learning
Ipython Notebooks for solving problems like classification, segmentation, generation using latest Deep learning algorithms on different publicly available text and image data-sets.
Text Mining in Python
🔤 Lightweight R package for manipulating [string] characters
Hatena Notation (はてな記法) Parser written in Go
1-53 of 53 projects