Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Unstructured | 4,404 | 65 | 3 months ago | 114 | November 30, 2023 | 204 | apache-2.0 | HTML | ||
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines. | ||||||||||
Hrconvert2 | 746 | 6 months ago | 8 | gpl-3.0 | PHP | |||||
A self-hosted, drag-and-drop & nosql file conversion server & share tool that supports 86 file formats in 13 languages. | ||||||||||
Dedoc | 49 | 4 months ago | 10 | November 24, 2023 | 1 | apache-2.0 | Python | |||
Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Document logical extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser) | ||||||||||
Dango Ocr | 15 | 3 years ago | Python | |||||||
DangoOCR: screenshot OCR recognize 文字识别,支持多种语言,识别后翻译,播放声音 | ||||||||||
Semantic Ai | 11 | 4 months ago | apache-2.0 | Python | ||||||
An open source framework for Retrieval-Augmented System (RAG) uses semantic search helps to retrieve the expected results and generate human readable conversational response with the help of LLM (Large Language Model). | ||||||||||
Hoshi | 7 | 5 years ago | mpl-2.0 | Python | ||||||
【星】pdf扫描件 转 docx | ||||||||||
Teserver | 6 | 5 years ago | JavaScript | |||||||
A simple Nodejs (Docker and S3 ready) server for extracting text from pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf... | ||||||||||
Pdf Converter | 5 | a year ago | 1 | mit | Python | |||||
Convert your PDF files into word documents or different image formats locally without uploading some servers unknown. |