Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python text processing
python
x
text-processing
x
140 search results found
Diff Match Patch
⭐
6,749
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
Introduction_to_ml_with_python
⭐
6,626
Notebooks and code for the book "Introduction to Machine Learning with Python"
Pymupdf
⭐
3,908
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Fastnlp
⭐
2,940
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
Pyparsing
⭐
2,033
Python library for creating PEG parsers
Text_classification
⭐
1,621
Text Classification Algorithms: A Survey
Hazm
⭐
1,104
Persian NLP Toolkit
Python Nameparser
⭐
605
A simple Python module for parsing human names into their individual components
Ekphrasis
⭐
583
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Variational Text Tensorflow
⭐
537
TensorFlow implementation of Neural Variational Inference for Text Processing
Python_basics
⭐
496
🐍 Syntax, working with Shell commands, Files, Text Processing, and more...
Pynlpl
⭐
466
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP spec
Bsed
⭐
416
Simple SQL-like syntax on top of Perl text processing.
Pyarabic
⭐
407
pyarabic
Pykospacing
⭐
348
Automatic Korean word spacing with Python
Wetextprocessing
⭐
338
Text Normalization & Inverse Text Normalization
Zhon
⭐
329
Constants used in Chinese text processing
Artificial Adversary
⭐
317
🗣️ Tool to generate adversarial text examples and test machine learning models against them
Textpipe
⭐
290
Textpipe: clean and extract metadata from text
Jaconv
⭐
254
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku
Konoha
⭐
200
🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
Tmtoolkit
⭐
191
Text Mining and Topic Modeling Toolkit for Python with parallel processing power
Textvec
⭐
190
Text vectorization tool to outperform TFIDF for classification tasks
Pyopenjtalk
⭐
165
Python wrapper for OpenJTalk
Stanza Old
⭐
141
Stanford NLP group's shared Python tools.
Nlpre
⭐
135
Python library for Natural Language Preprocessing (NLPre)
Padatious
⭐
132
A neural network intent parser
Support Tickets Classification
⭐
128
This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Word_cloud_fa
⭐
124
A wrapper for wordcloud module for creating Persian word clouds.
Colibri Core
⭐
122
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
Cogcomp Nlpy
⭐
108
CogComp's light-weight Python NLP annotators
Prenlp
⭐
105
Preprocessing Library for Natural Language Processing
Teanaps
⭐
92
자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.
Nostril
⭐
91
Nostril: Nonsense String Evaluator
Nlp
⭐
77
Free hands-on course with the implementation (in Python) and description of several Natural Language Processing (NLP) algorithms and techniques, on several modern platforms and libraries.
Soyspacing
⭐
76
띄어쓰기 오류 교정 라이브러리입니다. CRF 와 같은 머신러닝 알고리즘이 아닌, 직관적인 접근법으로 띄어쓰기를 교정합니다.
Cso Classifier
⭐
74
Python library that classifies content from scientific papers with the topics of the Computer Science Ontology (CSO).
Lingua Franca
⭐
72
Mycroft's multilingual text parsing and formatting library
Summarization
⭐
70
A sequence to sequence model for abstractive text summarization
Talkwithyourfiles
⭐
70
An LLM GUI application; enables you to interact with your files, offering dynamic parameters that can modify response behavior during runtime.
Perke
⭐
67
A keyphrase extractor for Persian
Hands On Python Natural Language Processing
⭐
65
Textcluster
⭐
60
短文本聚类预处理模块 Short text cluster
Wiki Table Scrape
⭐
59
Scrape tables from Wikipedia articles into CSVs
Fuzzychinese
⭐
52
A small package to fuzzy match chinese words
Python Gatenlp
⭐
51
Python text processing, pattern matching, and NLP framework
Konfuzio Sdk
⭐
48
OCR, extract and classify documents. In addition, annotate documents and build your own NLP and Computer Vision models using Python by downloading the data. Find examples in our Colab Notebooks, e. g. how to fine-tune Flair.
Deduce
⭐
47
Deduce: de-identification method for Dutch medical text
Dragonmapper
⭐
43
Identification and conversion functions for Chinese text processing
Suffixtree
⭐
39
Optimized implementation of suffix tree in python using Ukkonen's algorithm.
Sova Tts Tps
⭐
38
NLP-preprocessor for the SOVA-TTS project
Bertify
⭐
37
An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.
Oneai Python
⭐
35
Python SDK for One AI APIs. One AI is an NLP-as-a-service platform. Our APIs enables language comprehension in context, transforming texts from any source into structured data to use in code.
Speech2affective_gestures
⭐
35
This is the official implementation of the paper "Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning".
Applied Text Mining In Python
⭐
34
Repo for Applied Text Mining in Python (coursera) by University of Michigan
Pyline
⭐
33
Pyline is a grep-like, sed-like, awk-like command-line tool for line-based text processing in Python. https://pypi.python.org/pypi/pyline
Text Analysis
⭐
32
Explaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.
Text Classification Lstms Pytorch
⭐
31
The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Python Ucto
⭐
29
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).
Normalizer
⭐
28
This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation". It is intended to be used for normalizing / cleaning Bengali and English text.
Nlp Tools
⭐
28
Useful python NLP tools (evaluation, GUI interface, tokenization)
Markover
⭐
27
Natural Language Generation with Markov
Cinje
⭐
27
A Pythonic and ultra fast template engine DSL.
Nlp Stuff
⭐
27
A bit of everything about text and nlp [IN PROGRESS]
Python Mecab
⭐
27
A repository to bind mecab for Python 3.5+. Not using swig nor pybind. (Not Maintained Now)
Voice_chatbot
⭐
26
Chatbot in russian with speech recognition using PocketSphinx and speech synthesis using RHVoice. The AttentionSeq2Seq model is used. Imlemented using Python3+TensorFlow+Keras.
Pnlp
⭐
25
NLP预/后处理工具。
Find_job_titles
⭐
25
find any kind of occupation or job title in a text or file
Hashedindex
⭐
25
Python package providing an Inverted Index implementation using dictionaries
Twitter Text Python
⭐
23
Twitter Text Libraries for Python
Nlpo3
⭐
21
Thai Natural Language Processing library in Rust, with Python and Node bindings.
Atarashi
⭐
21
Atarashi scans for license statements in open source software, focusing on text statistics. Designed to work stand-alone and with FOSSology.
Mytwitterbot
⭐
20
A Twitter bot powered by a Recurrent Neural Network (RNN)
Nlcli
⭐
20
Natural language interface for the command line.
Data Science From Scratch
⭐
20
Code Companion to Joel Grus' book
Huggingface Datasets Text Quality Analysis
⭐
19
Retrieves parquet files from Hugging Face, identifies and quantifies junky data, duplication, contamination, and biased content in dataset using pandas
Blabla
⭐
18
Novoic's linguistic feature extraction library
Kts_linguistics
⭐
18
Spellcheck, phonetics, text processing and more
Textdatasetcleaner
⭐
18
Настраиваемый пайплайн для очистки текстовых датасетов от мусора
Hama Py
⭐
17
🦛 파이썬 한글 처리 라이브러리. Python Korean Morphological Analyzer
Advanced Text Mining
⭐
16
TEANAPS 라이브러리를 활용한 자연어 처리와 텍스트 분석 방법론에 대해 다룹니다.
Nlpiper
⭐
15
NLPiper is a package that agglomerates different NLP tools and applies their transformations in the target document.
Text2video
⭐
15
Text to Video Generation Problem
Greek Normalisation
⭐
15
utilities for validating and normalising Ancient Greek text
Odin Ai
⭐
15
Orgainzed Digital Intelligent Network (O.D.I.N)
Arabicprocessingcog
⭐
15
A Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.
Text Preprocess Python
⭐
15
Text preprocessing tools in python.
Andaluh Py
⭐
15
Transliterate español (spanish) spelling to andaluz proposals using python
Lara Hungarian Nlp
⭐
14
NLP class for rapid ChatBot development in Hungarian language
Text Mining For Beginner
⭐
14
파이썬 기초문법 부터 간단한 텍스트 분석을 수행하는 방법에 대해 다룹니다.
Emotion Recognition From Tweets
⭐
14
A comprehensive approach on recognizing emotion (sentiment) from a certain tweet. Supervised machine learning.
Humanreadable
⭐
14
humanreadable is a Python library to convert human-readable values to other units.
Trunajod2.0
⭐
14
An easy-to-use library to extract indices from texts.
Sentiment Analysis Cnn
⭐
13
Sentiment Analysis using Convolution Neural Networks(CNN) and Google News Word2Vec
Embeddings
⭐
12
zero-vocab or low-vocab embeddings
Text Mining For Practice
⭐
12
파이썬 라이브러리를 활용해 텍스트 분석을 수행하는 방법에 대해 다룹니다.
Text Analysis
⭐
12
Weaving analytical stories from text data
Auto Corpus
⭐
12
Auto-CORPus pipeline developed by a University of Leicester and Imperial College London collaboration to standardize text and table data extracted from full text publications. See Open Access publication at: https://doi.org/10.3389/fdgth.2022.788124.
Python Daachorse
⭐
12
🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. (Python wrapper for daachorse)
Dnlp
⭐
12
📚 Сборник полезных штук из Natural Language Processing: Определение языка текста, Разделение текста на предложения, Получение основного содержимого из html документа
Related Searches
Python Django (28,897)
Python Deep Learning (17,972)
Python Flask (17,643)
Python Machine Learning (16,719)
Python Jupyter Notebook (16,511)
Python Docker (14,810)
Python Dataset (14,792)
Python Tensorflow (13,736)
Python Command Line (13,351)
Python Network (11,547)
1-100 of 140 search results
Next >
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.