Project Name	Stars	Repos Using This	Packages Using This	Most Recent Commit	Total Releases	Latest Release	Open Issues	License	Language
Parsr	5,423			7 months ago			67	apache-2.0	JavaScript
Transforms PDF, Documents and Images into Enriched Structured Data
Video Subtitle Extractor	4,267			3 months ago			131	apache-2.0	Python
视频硬字幕提取，生成srt文件。无需申请第三方API，本地实现文本识别。基于深度学习的视频字幕提取框架，包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
Ccextractor	642			3 months ago			114	gpl-2.0	C
CCExtractor - Official version maintained by the core team
Videocr	439			a year ago	4	December 15, 2019	24	mit	Python
Extract hardcoded subtitles from videos using machine learning
Pdf Extract	313	20	8	4 years ago	28	February 15, 2017	13	mit	JavaScript
Node PDF Extract
Windowtextextractor	207			6 months ago			4	mit	C#
WindowTextExtractor allows you to get a text from any window of an operating system including asterisk passwords
Signature_extractor	203			3 years ago				mit	Python
A super lightweight image processing algorithm for detection and extraction of overlapped handwritten signatures on scanned documents using OpenCV and scikit-image.
Lambda Text Extractor	143			6 years ago				apache-2.0	Python
AWS Lambda functions to extract text from various binary formats.
Spacextract	140			3 years ago			1	mit	Python
Extraction and analysis of telemetry from rocket launch webcasts (from SpaceX and RocketLab)
Extracttable Py	138	2		2 years ago	27	July 18, 2022	2	apache-2.0	Python
Python library to extract tabular data from images and scanned PDFs

Alternatives To Doc_processing_toolkit

Select To Compare

Parsr ⭐ 5,423

Transforms PDF, Documents and Images into Enriched Structured Data

most recent commit 7 months ago

Video Subtitle Extractor ⭐ 4,267

视频硬字幕提取，生成srt文件。无需申请第三方API，本地实现文本识别。基于深度学习的视频字幕提取框 GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.

most recent commit 3 months ago

Ccextractor ⭐ 642

CCExtractor - Official version maintained by the core team

most recent commit 3 months ago

Videocr ⭐ 439

Extract hardcoded subtitles from videos using machine learning

total releases 4most recent commit a year ago

Pdf Extract ⭐ 313

Node PDF Extract

dependent packages 8total releases 28most recent commit 4 years ago

Windowtextextractor ⭐ 207

WindowTextExtractor allows you to get a text from any window of an operating system including asterisk passwords

most recent commit 6 months ago

Signature_extractor ⭐ 203

A super lightweight image processing algorithm for detection and extraction of overlapped handwritten signatures on scanned documents using OpenCV and scikit-image.

most recent commit 3 years ago

Lambda Text Extractor ⭐ 143

AWS Lambda functions to extract text from various binary formats.

most recent commit 6 years ago

Spacextract ⭐ 140

Extraction and analysis of telemetry from rocket launch webcasts (from SpaceX and RocketLab)

most recent commit 3 years ago

Extracttable Py ⭐ 138

Python library to extract tabular data from images and scanned PDFs

total releases 27most recent commit 2 years ago

Suggest An Alternative To doc_processing_toolkit

Alternative Project Comparisons

Doc_processing_toolkit vs Parsr

Doc_processing_toolkit vs Video Subtitle Extractor

Doc_processing_toolkit vs Ccextractor

Doc_processing_toolkit vs Videocr

Doc_processing_toolkit vs Pdf Extract

Doc_processing_toolkit vs Windowtextextractor

Doc_processing_toolkit vs Signature_extractor

Doc_processing_toolkit vs Lambda Text Extractor

Doc_processing_toolkit vs Spacextract

Doc_processing_toolkit vs Extracttable Py

Popular Extraction Projects

Newspaper ⭐ 13,147

News, full-text, and article metadata extraction in Python 3. Advanced docs:

dependent packages 97total releases 18latest release September 28, 2018most recent commit 7 months ago

Warp ⭐ 8,841

A super-easy, composable, web server framework for warp speeds.

dependent packages 491total releases 38latest release September 27, 2023most recent commit 3 months ago

Sm64 ⭐ 7,163

A Super Mario 64 decompilation, brought to you by a bunch of clever folks.

most recent commit 3 months ago

Archwsl ⭐ 6,039

ArchLinux based WSL Distribution. Supports multiple install.

most recent commit 4 months ago

Nlp.js ⭐ 5,944

An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more

dependent packages 92total releases 40latest release January 12, 2023most recent commit 4 months ago

Popular Ocr Projects

Tesseract ⭐ 56,096

Tesseract Open Source OCR Engine (main repository)

dependent packages 7total releases 1latest release February 27, 2018most recent commit 3 months ago

Paddleocr ⭐ 36,076

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

dependent packages 30total releases 40latest release September 15, 2023most recent commit 3 months ago

Tesseract.js ⭐ 32,523

Pure Javascript OCR for more than 100 Languages 📖🎉🖥

dependent packages 224total releases 66latest release October 30, 2023most recent commit 3 months ago

Sharex ⭐ 26,630

ShareX is a free and open source program that lets you capture or record any area of your screen and share it with a single press of a key. It also allows uploading images, text or other types of files to many supported destinations you can choose from.

most recent commit 3 months ago

Easyocr ⭐ 20,438

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

dependent packages 69total releases 32latest release September 04, 2023most recent commit 4 months ago

Popular Data Processing Categories

Get A Weekly Email With Trending Projects For These Categories

No Spam. Unsubscribe easily at any time.

Python

Python Library

Ocr

Tesseract

Tika

Text Extraction

Privacy | About | Terms | Follow Us On Twitter

Downloads, Dependent Repos, Dependent Packages, Total Releases, Latest Releases data powered by Libraries.io.