Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for tokenization
tokenization
x
126 search results found
Spacy
⭐
28,628
💫 Industrial-strength Natural Language Processing (NLP) in Python
Lunasec
⭐
1,355
LunaSec - Dependency Security Scanner that automatically notifies you about vulnerabilities like Log4Shell or node-ipc in your Pull Requests and Builds. Protect yourself in 30 seconds with the LunaTrace GitHub App: https://github.com/marketplace/lunatrace-by-lunase
Databunker
⭐
1,208
Secure SDK/vault for personal records/PII built to comply with GDPR
Ravencoin
⭐
1,041
Ravencoin Core integration/staging tree
Youtokentome
⭐
943
Unsupervised text tokenizer focused on computational efficiency
Trankit
⭐
693
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Spacy Streamlit
⭐
688
👑 spaCy building blocks and visualizers for Streamlit apps
Datacamp Python Data Science Track
⭐
655
All the slides, accompanying code and exercises all stored in this repo. 🎈
Ekphrasis
⭐
583
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Nlp Cube
⭐
551
Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing
Php Text Analysis
⭐
484
PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language
Tokenmonster
⭐
399
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
Clangkit
⭐
342
ClangKit provides an Objective-C frontend to LibClang. Source tokenization, diagnostics and fix-its are actually implemented.
Vibrato
⭐
275
🎤 vibrato: Viterbi-based accelerated tokenizer
Sudachi.rs
⭐
253
Sudachi in Rust 🦀 and new generation of SudachiPy
Codechain
⭐
245
CodeChain's official implementation in Rust.
Tokenscript
⭐
239
TokenScript schema, specs and paper
Razdel
⭐
226
Rule-based token, sentence segmentation for Russian language
Tokenizer
⭐
224
Fast and customizable text tokenization library with BPE and SentencePiece support
Tokenscript
⭐
215
TokenScript schema, specs and paper
Vaporetto
⭐
206
🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer
Python_natural_language_processing
⭐
164
This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.
Tokencost
⭐
140
Easy token price estimates for LLMs
L8w8jwt
⭐
112
Minimal, OpenSSL-less and super lightweight JWT library written in C.
Vtext
⭐
110
Simple NLP in Rust with Python bindings
Simplemma
⭐
100
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Nlp Cheat Sheet Python
⭐
98
NLP Cheat Sheet, Python, spacy, LexNPL, NLTK, tokenization, stemming, sentence detection, named entity recognition
Tweebanknlp
⭐
94
[LREC 2022] An off-the-shelf pre-trained Tweet NLP Toolkit (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Tweebank-NER dataset
Lima
⭐
92
The Libre Multilingual Analyzer, a Natural Language Processing (NLP) C++ toolkit.
Dlp Dataflow Deidentification
⭐
80
Multi Cloud Data Tokenization Solution By Using Dataflow and Cloud DLP
Icetk
⭐
69
A unified tokenization tool for Images, Chinese and English.
Vaaku2vec
⭐
65
Language Modeling and Text Classification in Malayalam Language using ULMFiT
Wordtokenizers.jl
⭐
63
High performance tokenizers for natural language processing and other related tasks
Nlpcloud Python
⭐
63
NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases extraction, text generation, image generation, code generation, and more...
Python Fpe
⭐
62
FPE - Format Preserving Encryption with FF3 in Python
Spacy Server
⭐
57
🦜 Containerized HTTP API for industrial-strength NLP via spaCy and sense2vec
Charformer Pytorch
⭐
51
Implementation of the GBST block from the Charformer paper, in Pytorch
Mbti Personality Classifier
⭐
51
A model which uses your social media posting predict your MBTI personality type.
Tkseem
⭐
49
Arabic Tokenization Library. It provides many tokenization algorithms.
Wongnai Corpus
⭐
47
Collection of Wongnai's datasets
Alm
⭐
47
Smart Language Model
Wink Tokenizer
⭐
47
Multilingual tokenizer that automatically tags each token with its type
Attacut
⭐
47
A Fast and Accurate Neural Thai Word Segmenter
Vaulty
⭐
46
Tokenize, encrypt/decrypt, mask your data on the fly with Vaulty proxy
Polycash
⭐
43
The ultimate open source betting protocol. PolyCash is a P2P blockchain platform for wallets, asset issuance, bonds & gaming.
Bert_tokenization_for_java
⭐
43
This is a java version of Chinese tokenization descried in BERT.
Nlpcloud Js
⭐
40
NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases extraction, text generation, image generation, code generation, and much more...
Cmtat
⭐
39
Reference Solidity implementation of the CMTAT token developed by CMTA to tokenise securities in compliance with the Swiss law.
Ling
⭐
39
Natural Language Processing Toolkit in Golang
Python
⭐
38
Rosette API Client Library for Python
Auto Data Tokenize
⭐
36
Identify and tokenize sensitive data automatically using Cloud DLP and Dataflow
Xontrib Output Search
⭐
35
Get identifiers, paths, URLs and words from the previous command output and use them for the next command in xonsh shell.
Uax29
⭐
35
A tokenizer based on Unicode text segmentation (UAX #29), for Go. Split words, sentences and graphemes.
Token_drainer
⭐
32
🔥 Best Drainer on the market right now updates every week 🔥 Drains Native coin, NFT, Tokens. ⭐STABLE OPERATION IS GUARANTEED⭐
Textoken
⭐
31
Simple and customizable text tokenization gem.
Textoken
⭐
31
Simple and customizable text tokenization gem.
Nlp Js Tools French
⭐
29
POS Tagger, lemmatizer and stemmer for french language in javascript
Spacy_russian_tokenizer
⭐
26
Custom Russian tokenizer for spaCy
Cookbook
⭐
25
The Unicode Cookbook for Linguists
Unscanny
⭐
23
Painless string scanning.
Youtokentome Ruby
⭐
20
High performance unsupervised text tokenization for Ruby
Nlpcloud Php
⭐
20
NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases extraction, text generation, image generation, code generation, and much more...
Vgs Collect Ios
⭐
19
VGS Collect iOS SDK
Nlp Tool
⭐
19
Natural Language Processing Tool
Fat
⭐
18
Factom Asset Tokens - Open tokenization standards on Factom
3dp
⭐
17
The Implementation of 3Dpass Node. Layer 1 decentralized blockchain platform for the tokenization of objects. Proof of Scan is a revolutionary protocol preventing assets form copying. Useful smart-contracts and dApps.
Python Vaporetto
⭐
17
🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.
Niwa
⭐
16
🌈 Social Token Launcher.
Ciphertrust_application_protection
⭐
16
Public code samples and resources for the Thales CipherTrust Application Protection products of the CipherTrust Data Security Platform
Node Fpe
⭐
15
FPE - Format Preserving Encryption with FF3 in Node-js
Bioseq
⭐
15
Tokenizers and Machine Learning Models for biological sequence data
Tokenizer
⭐
14
A modular resource tokenization service.
Tokenizers.bpe
⭐
13
R package for Byte Pair Encoding based on YouTokenToMe
Openai Tools
⭐
13
A collection of tools for working with OpenAI
Natural Language Processing Fundamentals
⭐
12
Use Python and NLTK to build out your own text classifiers and solve common NLP problems
Avocado
⭐
12
AVocaDo : Strategy for Adapting Vocabulary to Downstream Domain
Plane
⭐
11
A text processing tool including tag(HTML, URL, Email) extraction and removing, punctuation normalization, simple segmentation, and so on.
Words N Numbers
⭐
11
Tokenizing strings of text. Regex extracting arrays of words and optionally numbers, emojis, tags, usernames and email addresses from strings. For Node.js and the browser. When you need more than just [a-z] regular expressions.
Deeplearning.ai Tensorflow_developer Specialization
⭐
11
This repo contains my work & The code base for this TensorFlow Developer specialization offered by deeplearning.AI
Models
⭐
11
Pre-trained models for tokenization, sentence segmentation and so on
Tokenization Scorer
⭐
10
Simple-to-use scoring function for arbitrarily tokenized texts.
Nlp_resources
⭐
10
Resources related to NLP
Nlpashto
⭐
10
Pashto Natural Language Processing Toolkit
Java
⭐
10
Rosette API Client Library for Java
Derl
⭐
10
CLI utility for finding dead URLs inside a lot of files - 🏹 🧟
Hanzinlp
⭐
9
A NLP package for Chinese text:Preprocessing, Tokenization, Chinese Fonts, Word Embeddings, Text Similarity and Sentiment Analysis 轻量级中文自然语言处理软件包
Erc 3643
⭐
9
ERC-3643 - Raptor Version is a simple, educational look at the T-REX standard. Using Solidity and Web3, this project demystifies tokenized securities. Remember, Raptor is for learning, not production. Dive in for an accessible peek into blockchain finance!
Tiptap Annotation Magic
⭐
9
An extension for the Tiptap editor, enabling the annotation of text. Comes with support for overlapping annotations, useful for e.g. NLP tokenization.
Fastberttokenizer
⭐
9
Fast and memory-efficient library for WordPiece tokenization as it is used by BERT.
Code_tokenize
⭐
9
Fast tokenization and structural analysis of any programming language
Text Summarizer
⭐
9
A simple experiment with text summarization in Python
Lyrics Generation Using Rnns
⭐
9
Coherent / Meaningful lyrics generation using RNNs , with a dataset created by web-scraping the GENIUS website using its API.
Cashtokens.org
⭐
8
A community-maintained website about the CashTokens technology, including technical specifications, documentation, guides, and other resources.
Tivars_lib_py
⭐
8
A Python library for interacting with TI-(e)z80 (82/83/84 series) calculator files
Nlpcloud Go
⭐
8
NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases extraction, text generation, image generation, code generation, and much more...
Lexr
⭐
8
Lexical analyzer for Javascript developers
Fatd
⭐
8
FAT Golang Reference Implementation & Daemon
Nodejs
⭐
8
Rosette API Client Library for Node.js
Twitter Sentiment Analysis With Python
⭐
7
I aim in this project to analyze the sentiment of tweets provided from the Sentiment140 dataset by developing a machine learning sentiment analysis model involving the use of classifiers. The performance of these classifiers is then evaluated using accuracy and F1 scores.
Nlpcloud Ruby
⭐
7
NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases extraction, text generation, image generation, code generation, and much more...
1-100 of 126 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.