Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Camelot | 1,873 | 10 days ago | 204 | mit | Python | |||||
A Python library to extract tabular data from PDFs | ||||||||||
Excalibur | 1,226 | 3 months ago | 100 | mit | HTML | |||||
A web interface to extract tabular data from PDFs | ||||||||||
Npm Pdfreader | 474 | 29 | 19 | 10 days ago | 48 | April 23, 2022 | 2 | mit | HTML | |
🚜 Parse text and tables from PDF files. | ||||||||||
Deltapy | 411 | a year ago | 11 | April 09, 2020 | 2 | Jupyter Notebook | ||||
DeltaPy - Tabular Data Augmentation (by @firmai) | ||||||||||
Extracttable Py | 138 | 2 | 8 months ago | 27 | May 06, 2022 | 2 | apache-2.0 | Python | ||
Python library to extract tabular data from images and scanned PDFs | ||||||||||
Img2txt | 19 | 2 years ago | mit | Python | ||||||
Easy formatted text extraction from images using Google Vision API | ||||||||||
Camelot Sharp | 10 | 1 | a year ago | 1 | January 17, 2021 | mit | C# | |||
A C# library to extract tabular data from PDFs (port of camelot Python version using PdfPig). | ||||||||||
Tabularazr | 4 | 7 years ago | 11 | Jupyter Notebook | ||||||
Automatic extraction of tabular data from research papers and financial documents. | ||||||||||
Pdfgrid | 3 | 9 years ago | Java | |||||||
Excalibur is a web interface to extract tabular data from PDFs, written in Python 3! It is powered by Camelot.
Note: Excalibur only works with text-based PDFs and not scanned documents. (As Tabula explains, "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)
Note: You need to install ghostscript before moving forward.
After installing Excalibur with pip, you need to initialize the metadata database using:
$ excalibur initdb
And then start the webserver using:
$ excalibur webserver
That's it! Now you can go to http://localhost:5000 and start extracting tabular data from your PDFs.
Upload a PDF and enter the page numbers you want to extract tables from.
Go to each page and select the table by drawing a box around it. (You can choose to skip this step since Excalibur can automatically detect tables on its own. Click on "Autodetect tables" to see what Excalibur sees.)
Choose a flavor (Lattice or Stream) from "Advanced".
a. Lattice: For tables formed with lines.
b. Stream: For tables formed with whitespaces.
Click on "View and download data" to see the extracted tables.
Select your favorite format (CSV/Excel/JSON/HTML) and click on "Download"!
Note: You can also download executables for Windows and Linux from the releases page and run them directly!
After installing ghostscript, which is one of the requirements for Camelot (See install instructions), you can simply use pip to install Excalibur:
$ pip install excalibur-py
After installing ghostscript, clone the repo using:
$ git clone https://www.github.com/camelot-dev/excalibur
and install Excalibur using pip:
$ cd excalibur $ pip install .
Fantastic documentation is available at http://excalibur-py.readthedocs.io/.
The Contributor's Guide has detailed information about contributing code, documentation, tests and more. We've included some basic information in this README.
You can check the latest sources with:
$ git clone https://www.github.com/camelot-dev/excalibur
You can install the development dependencies easily, using pip:
$ pip install excalibur-py[dev]
After installation, you can run tests using:
$ python setup.py test
Excalibur uses Semantic Versioning. For the available versions, see the tags on this repository. For the changelog, you can check out HISTORY.md.
This project is licensed under the MIT License, see the LICENSE file for details.
You can support our work on Excalibur with a one-time or monthly donation on OpenCollective. Organizations who use Excalibur can also sponsor the project for an acknowledgement on our official site and this README.
Special thanks to all the users and organizations that support Excalibur!