Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
The Top 10 Text Extraction Open Source Projects
Open source projects categorized as Text Extraction
Categories
>
Text Processing
>
Text Extraction
Edit Category
miso-belica/sumy
⭐
3,669
Module for automatic summarization of text documents and HTML pages.
dependent packages
0
total releases
0
most recent commit
2 months ago
adbar/trafilatura
⭐
2,447
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
dependent packages
0
total releases
0
most recent commit
over 2 years ago
unidoc/unipdf
⭐
2,231
Golang PDF library for creating and processing PDF files (pure go)
dependent packages
0
total releases
0
most recent commit
over 2 years ago
chrismattmann/tika-python
⭐
1,316
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
dependent packages
0
total releases
0
most recent commit
almost 3 years ago
whitelok/image-text-localization-recognition
⭐
928
A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集 シーンテキストの位置認識と識別のための論文リソースの要約
dependent packages
0
total releases
0
most recent commit
over 2 years ago
miso-belica/jusText
⭐
811
Heuristic based boilerplate removal tool
dependent packages
0
total releases
0
most recent commit
over 1 year ago
unidoc/unidoc
⭐
691
This repository has moved! https://github.com/unidoc/unipdf
dependent packages
0
total releases
0
most recent commit
about 7 years ago
MaLeLabTs/RegexGenerator
⭐
656
This project contains the source code of a tool for generating regular expressions for text extraction: 1. automatically, 2. based only on examples of the desired behavior, 3. without any external hint about how the target regex should look like
dependent packages
0
total releases
0
most recent commit
about 7 years ago
ICIJ/datashare
⭐
519
A self-hosted search engine for documents.
dependent packages
0
total releases
0
most recent commit
over 2 years ago
ropensci/pdftools
⭐
480
Text Extraction, Rendering and Converting of PDF Documents
dependent packages
0
total releases
0
most recent commit
over 2 years ago
Get A Weekly Email With Trending Text Extraction Projects
No Spam. Unsubscribe easily at any time.
Text Extraction
Subscribe
Javascript must be enabled to subscribe.
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2026 Awesome Open Source. All rights reserved.