Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for tokenizer tokenization
tokenization
x
tokenizer
x
26 search results found
Ekphrasis
⭐
583
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Php Text Analysis
⭐
484
PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language
Tokenmonster
⭐
399
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
Vibrato
⭐
275
🎤 vibrato: Viterbi-based accelerated tokenizer
Tokenizer
⭐
224
Fast and customizable text tokenization library with BPE and SentencePiece support
Vaporetto
⭐
206
🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer
Simplemma
⭐
100
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Wordtokenizers.jl
⭐
63
High performance tokenizers for natural language processing and other related tasks
Wink Tokenizer
⭐
47
Multilingual tokenizer that automatically tags each token with its type
Attacut
⭐
47
A Fast and Accurate Neural Thai Word Segmenter
Alm
⭐
47
Smart Language Model
Xontrib Output Search
⭐
35
Get identifiers, paths, URLs and words from the previous command output and use them for the next command in xonsh shell.
Uax29
⭐
35
A tokenizer based on Unicode text segmentation (UAX #29), for Go. Split words, sentences and graphemes.
Nlp Js Tools French
⭐
29
POS Tagger, lemmatizer and stemmer for french language in javascript
Spacy_russian_tokenizer
⭐
26
Custom Russian tokenizer for spaCy
Python Vaporetto
⭐
17
🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.
Openai Tools
⭐
13
A collection of tools for working with OpenAI
Plane
⭐
11
A text processing tool including tag(HTML, URL, Email) extraction and removing, punctuation normalization, simple segmentation, and so on.
Words N Numbers
⭐
11
Tokenizing strings of text. Regex extracting arrays of words and optionally numbers, emojis, tags, usernames and email addresses from strings. For Node.js and the browser. When you need more than just [a-z] regular expressions.
Tiptap Annotation Magic
⭐
9
An extension for the Tiptap editor, enabling the annotation of text. Comes with support for overlapping annotations, useful for e.g. NLP tokenization.
Lexr
⭐
8
Lexical analyzer for Javascript developers
Basic Dignified
⭐
7
Program classic system's 8-bit Basic on a modern environment with modern tools and paradigms (MSX and CoCo modules included).
Tokenize Output
⭐
6
Get identifiers, names, paths, URLs and words from the terminal command output.
Taibun
⭐
6
Taiwanese Hokkien Transliterator and Tokeniser
Sept
⭐
6
A simple extensible path template generator
Objectpascalparser
⭐
5
An attempt at an Object Pascal Parser
1-26 of 26 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.