Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Lambda Text Extractor | 143 | 6 years ago | apache-2.0 | Python | ||||||
AWS Lambda functions to extract text from various binary formats. | ||||||||||
Pd3f | 131 | a year ago | 13 | agpl-3.0 | HTML | |||||
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based | ||||||||||
Php Apache Tika | 104 | 3 | 3 | 8 months ago | 38 | April 14, 2023 | mit | PHP | ||
Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats | ||||||||||
Doc_processing_toolkit | 52 | 7 years ago | 4 | other | Python | |||||
Python library to extract text from PDF, and default to OCR when text extraction fails. | ||||||||||
Wagtail_textract | 31 | 6 months ago | 8 | September 06, 2019 | 14 | bsd-3-clause | Python | |||
Text extraction for Wagtail document search | ||||||||||
Mimeograph | 28 | 2 | 3 | 11 years ago | 11 | March 08, 2017 | 6 | CoffeeScript | ||
CoffeeScript lib for PDF OCR and text extraction | ||||||||||
Aiopytesseract | 13 | 4 months ago | 13 | November 21, 2023 | apache-2.0 | Python | ||||
A Python asyncio wrapper for Tesseract-OCR. | ||||||||||
Tesseractocr | 12 | 9 years ago | mit | Shell | ||||||
Full text extraction using the Open Source Tesseract OCR software https://code.google.com/p/tesseract-ocr/ and imagemagick | ||||||||||
Cosmic Cube | 5 | 10 years ago | Python | |||||||
PDF image analysis and selective text extraction using tesseract |