|Project Name||Stars||Downloads||Repos Using This||Packages Using This||Most Recent Commit||Total Releases||Latest Release||Open Issues||License||Language|
|Manga Ocr||1,012||3||3 months ago||12||August 27, 2023||9||apache-2.0||Python|
|Optical character recognition for Japanese text, with the main focus being Japanese manga|
|Mokuro||589||a day ago||9||December 03, 2023||25||gpl-3.0||HTML|
|Read Japanese manga inside browser with selectable text.|
|Mangaripper||159||2 years ago||21||mit||C#|
|This software helps you download manga (Japanese Comic) from several websites for your offline viewing.|
|Open Mantra Dataset||152||9 months ago||other|
|🔞 Download manga from nhentai.net 一个含有语言过滤和智能去重功能的N站本子下载器|
|Kaku||132||a year ago||17||bsd-3-clause||Kotlin|
|画 - Japanese OCR Dictionary|
|Roboragi||131||4 years ago||35||agpl-3.0||Python|
|Roboragi is a Reddit bot which helps link anime, manga, and other Japanese media.|
|Yuzumarker||70||2 years ago||4||other||C#|
|🍋 [WIP] Manga Translation Tool|
|Kamite||67||10 days ago||1||agpl-3.0||Java|
|Japanese immersion assistant for learners (Windows/Linux)|
|Aruppi Api||40||8 months ago||16||mit||TypeScript|
|Aruppi API has everything about Japan, from anime, music, radio, images and videos to japanese culture|
Optical character recognition for Japanese text, with the main focus being Japanese manga. It uses a custom end-to-end model built with Transformers' Vision Encoder Decoder framework.
Manga OCR can be used as a general purpose printed Japanese OCR, but its main goal was to provide a high quality text recognition, robust against various scenarios specific to manga:
Unlike many OCR models, Manga OCR supports recognizing multi-line text in a single forward pass, so that text bubbles found in manga can be processed at once, without splitting them into lines.
You need Python 3.8, 3.9, 3.10 or 3.11.
If you want to run with GPU, install PyTorch as described here, otherwise this step can be skipped.
Run in command line:
pip3 install manga-ocr
ImportError: DLL load failed while importing fugashi: The specified module could not be found.- might be because of Python installed from Microsoft Store, try installing Python from the official site
mecab-python3on ARM architecture - try this workaround
from manga_ocr import MangaOcr mocr = MangaOcr() text = mocr('/path/to/img')
import PIL.Image from manga_ocr import MangaOcr mocr = MangaOcr() img = PIL.Image.open('/path/to/img') text = mocr(img)
Manga OCR can run in the background and process new images as they appear.
You might use a tool like ShareX or Flameshot to manually capture a region of the screen and let the OCR read it either from the system clipboard, or a specified directory. By default, Manga OCR will write recognized text to clipboard, from which it can be read by a dictionary like Yomichan.
Clipboard mode on Linux requires
wl-copy for Wayland sessions or
xclip for X11 sessions. You can find out which one your system needs by running
echo $XDG_SESSION_TYPE in the terminal.
Your full setup for reading manga in Japanese with a dictionary might look like this:
capture region with ShareX -> write image to clipboard -> Manga OCR -> write text to clipboard -> Yomichan
Note that when running in the clipboard scanning mode, any image that you copy to clipboard will be processed by OCR and replaced by recognized text. If you want to be able to copy and paste images as usual, you should use the folder scanning mode instead and define a separate task in ShareX just for OCR, which saves screenshots to some folder without copying them to clipboard.
When running for the first time, downloading the model (~400 MB) might take a few minutes.
The OCR is ready to use after
OCR ready message appears in the logs.
manga_ocr doesn't work, you might also try replacing it with
python -m manga_ocr.
Here are some cherry-picked examples showing the capability of the model.
|image||Manga OCR result|
For any inquiries, please feel free to contact me at [email protected]
This project was done with the usage of: