Doc_processing_toolkit

Python library to extract text from PDF, and default to OCR when text extraction fails.
Alternatives To Doc_processing_toolkit
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Parsr5,423
7 months ago67apache-2.0JavaScript
Transforms PDF, Documents and Images into Enriched Structured Data
Video Subtitle Extractor4,267
3 months ago131apache-2.0Python
视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
Ccextractor642
3 months ago114gpl-2.0C
CCExtractor - Official version maintained by the core team
Videocr439
a year ago4December 15, 201924mitPython
Extract hardcoded subtitles from videos using machine learning
Pdf Extract3132084 years ago28February 15, 201713mitJavaScript
Node PDF Extract
Windowtextextractor207
6 months ago4mitC#
WindowTextExtractor allows you to get a text from any window of an operating system including asterisk passwords
Signature_extractor203
3 years agomitPython
A super lightweight image processing algorithm for detection and extraction of overlapped handwritten signatures on scanned documents using OpenCV and scikit-image.
Lambda Text Extractor143
6 years agoapache-2.0Python
AWS Lambda functions to extract text from various binary formats.
Spacextract140
3 years ago1mitPython
Extraction and analysis of telemetry from rocket launch webcasts (from SpaceX and RocketLab)
Extracttable Py138
22 years ago27July 18, 20222apache-2.0Python
Python library to extract tabular data from images and scanned PDFs
Alternatives To Doc_processing_toolkit
Select To Compare


Alternative Project Comparisons
Popular Extraction Projects
Popular Ocr Projects
Popular Data Processing Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python
Python Library
Ocr
Tesseract
Tika
Text Extraction