Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
Search results for python text extraction
python
x
text-extraction
x
38 search results found
Sumy
⭐
3,343
Module for automatic summarization of text documents and HTML pages.
Trafilatura
⭐
2,447
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Tika Python
⭐
1,316
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Justext
⭐
509
Heuristic based boilerplate removal tool
Srt
⭐
389
A simple library and set of tools for parsing, modifying, and composing SRT files.
Crestify
⭐
232
Intelligent Bookmarking
Breadability
⭐
191
Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)
Cutie
⭐
144
CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extractor)
Lambda Text Extractor
⭐
143
AWS Lambda functions to extract text from various binary formats.
Pd3f
⭐
131
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
Aut
⭐
128
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Benchmarks
⭐
93
Benchmarking PDF libraries
Wikipedia_ner
⭐
56
📖 Labeled examples from wiki dumps in Python
Doc_processing_toolkit
⭐
52
Python library to extract text from PDF, and default to OCR when text extraction fails.
Datasheet Scrubber
⭐
45
Extend
⭐
43
Entity Disambiguation as text extraction (ACL 2022)
Text Extraction Evaluation
⭐
42
Framework for evaluating text extraction algorithms implemented as web services
Mobi
⭐
37
python based software to unpack kindlegen generated ebooks
Pdf Text Data Extractor
⭐
32
PDF text data extraction web app with OCR for scanned documents
Wagtail_textract
⭐
31
Text extraction for Wagtail document search
Querido Diario Toolbox
⭐
30
Este projeto empodera quem deseja processar dados no contexto do Querido Diário e realizar suas próprias análises.
Bte
⭐
27
BTE: Body Text Extraction
Pnlp
⭐
25
NLP预/后处理工具。
Text_extraction
⭐
25
提取金融相关领域研究报告的主要结论(key idea)
Screaming Frog Shingling
⭐
21
Uses Screaming Frog Internal HTML with text extraction along with a shingling algorithm to compare content duplication across the pages of a crawled site.
Img2txt
⭐
19
Easy formatted text extraction from images using Google Vision API
Mirusan
⭐
17
A PDF collection reader with built-in full-text search engine
Textextractor2.0
⭐
14
🔥 This web app extracts text in an image.
Aiopytesseract
⭐
13
A Python asyncio wrapper for Tesseract-OCR.
Ocrd_calamari
⭐
12
Recognize text using Calamari OCR and the OCR-D framework
Medinify
⭐
11
Python text classification package with a focus on medical text.
Pine
⭐
11
A simple image to text OCR scanner for macOS
Pdf_text_extract
⭐
10
AWS Lambda function written in Python to perform text extraction (using Slate) from a PDF put to S3 & indexed in ElasticSearch. — Edit
Movie Classfiction Pased On It S Arabic Subtitle
⭐
10
classify English movies by using its Arabic subtitle
Tesseract Ocr Wrapper
⭐
9
This is a highly efficient python wrapper for tesseract-ocr.
Arxiv Fulltext
⭐
9
arXiv plain text extraction
Articleparse
⭐
8
Heuristic text extraction from news sites in Python3
Hotpdf
⭐
8
hotpdf is a fast PDF scraping library to extract text and find text within PDF documents
Voice Prescription
⭐
6
Built a GUI application using Tkinter that helps Doctors to prepare prescriptions more efficiently. This uses speech to text conversion and text extraction to prepare prescriptions in the correct format. This project is submitted for Smart India Hackathon 2020.
Cosmic Cube
⭐
5
PDF image analysis and selective text extraction using tesseract
Tecroom
⭐
5
技术栈在线总结文档,包含编程语言、数据结构与算法、机器学习、数据库等。
Related Searches
Python Django (26,307)
Python Machine Learning (20,195)
Python Flask (17,643)
Python Dataset (14,792)
Python Docker (14,113)
Python Tensorflow (13,739)
Python Command Line (13,351)
Python Deep Learning (13,095)
Python Jupyter Notebook (12,976)
Python Network (11,495)
1-38 of 38 search results
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2024 Awesome Open Source. All rights reserved.