Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Tesseract | 51,145 | 5 | 14 days ago | 1 | February 27, 2018 | 396 | apache-2.0 | C++ | ||
Tesseract Open Source OCR Engine (main repository) | ||||||||||
Tesseract.js | 30,533 | 221 | 125 | a day ago | 50 | September 20, 2022 | 14 | apache-2.0 | JavaScript | |
Pure Javascript OCR for more than 100 Languages 📖🎉🖥 | ||||||||||
Siyuan | 10,603 | 18 hours ago | 1 | July 07, 2022 | 51 | agpl-3.0 | TypeScript | |||
A privacy-first, self-hosted, fully open-source personal knowledge management software, written in typescript and golang. | ||||||||||
Ocrmypdf | 9,000 | 6 | 7 | 4 days ago | 205 | July 04, 2022 | 98 | mpl-2.0 | Python | |
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched | ||||||||||
Faceai | 6,666 | 3 years ago | 29 | mit | Python | |||||
一款入门级的人脸、视频、文字检测以及识别的项目. | ||||||||||
Tessdata | 5,117 | 16 days ago | 3 | May 19, 2020 | 47 | apache-2.0 | ||||
Trained models with fast variant of the "best" LSTM models + legacy models | ||||||||||
Pytesseract | 4,823 | 714 | 242 | 24 days ago | 27 | February 19, 2022 | 27 | apache-2.0 | Python | |
A Python wrapper for Google Tesseract | ||||||||||
Tesseract Ocr Ios | 4,038 | 172 | 3 years ago | 7 | April 04, 2015 | 112 | mit | C | ||
Tesseract OCR iOS is a Framework for iOS7+, compiled also for armv7s and arm64. | ||||||||||
Tess Two | 3,642 | a year ago | 15 | February 11, 2021 | apache-2.0 | C | ||||
Fork of Tesseract Tools for Android | ||||||||||
Ocrad.js | 3,227 | 22 | 9 | 3 years ago | 1 | May 06, 2014 | 24 | gpl-3.0 | JavaScript | |
OCR in Javascript via Emscripten |
⚗️ Experimental Frappe OCR application with tesseract.
This project is a fork of ERPNext-OCR by John Vincent Fiel. Its aim is to fix and cleanup the original source code and add some new features.
Check out more on ERPNext Discuss.
See CHANGELOG
See Taiga.io
Install tesseract-ocr, plus imagemagick and ghostscript (to work with pdf files) using this command on Debian:
sudo apt-get install tesseract-ocr imagemagick libmagickwand-dev ghostscript
bench get-app --branch develop erpnext_ocr https://github.com/Monogramm/erpnext_ocr
bench install-app erpnext_ocr
When installing Frappe app, the following python requirements will be installed:
python binding for tesseract, tesserocr
image processing library in python, pillow
HTTP library in python, requests
python binding for imagemagick, wand
File Being Read:
Sample Screenshot:
In order to use OCR with different languages, you need to install the appropriate trained data files. Check tesseract Wiki for details: https://github.com/tesseract-ocr/tesseract/wiki/Data-Files
If you wish to develop or just test locally this application, you can use docker-compose up -d
at the root of the this repository.
You can then access your ERPNext OCR dev env at http://localhost:8080
.
wand.exceptions.PolicyError: not authorized '/opt/sample.pdf' @ error/constitute.c/ReadImage/412
This can happen due to security configuration in imagemagick preventing it to read PDF files.
Reference:
wand.exceptions.WandRuntimeError: MagickReadImage returns false, but did raise ImageMagick exception. This can occurs when a delegate is missing, or returns EXIT_SUCCESS without generating a raster.
This might happen if you're missing a dependency to convert PDF, most of the time ghostscript
References:
OSError: encoder error -2 when writing image file
Fax3SetupState: Bits/sample must be 1 for Group 3/4 encoding/decoding.
that usually happens when TIFF image compression is not valid / recognized.bench run-tests --app erpnext_ocr
Monogramm
John Vincent Fiel
Contributions, issues and feature requests are welcome!
Feel free to check issues page.
Check the contributing guide.
Give a ⭐️ if this project helped you!
Copyright © 2019 Monogramm.
This project is MIT licensed.
This README was generated with ❤️ by readme-md-generator