Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Modin | 8,697 | 9 | 17 | 21 hours ago | 62 | June 25, 2022 | 901 | apache-2.0 | Python | |
Modin: Scale your Pandas workflows by changing a single line of code | ||||||||||
Cudf | 5,558 | 5 hours ago | 24 | August 18, 2022 | 844 | apache-2.0 | C++ | |||
cuDF - GPU DataFrame Library | ||||||||||
Pygwalker | 5,543 | 10 days ago | 30 | apache-2.0 | Python | |||||
PyGWalker: Turn your pandas dataframe into a Tableau-style User Interface for visual analysis | ||||||||||
Datasciencepython | 4,776 | a month ago | 11 | mit | Python | |||||
common data analysis and machine learning tasks using python | ||||||||||
Danfojs | 4,199 | 3 | 25 days ago | 36 | April 20, 2022 | 68 | mit | TypeScript | ||
Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data. | ||||||||||
Mimesis | 3,972 | 42 | 19 | 11 days ago | 37 | June 22, 2022 | 8 | mit | Python | |
Mimesis is a robust data generator for Python, capable of rapidly producing large volumes of synthetic data for various use cases. | ||||||||||
Pandas Ta | 3,696 | 19 | 2 days ago | 19 | July 28, 2021 | 97 | mit | Python | ||
Technical Analysis Indicators - Pandas TA is an easy to use Python 3 Pandas Extension with 130+ Indicators | ||||||||||
Koalas | 3,228 | 1 | 12 | 6 months ago | 47 | October 19, 2021 | 109 | apache-2.0 | Python | |
Koalas: pandas API on Apache Spark | ||||||||||
Pandasgui | 2,901 | 1 | 4 | 4 months ago | 43 | August 14, 2021 | 54 | mit-0 | Python | |
A GUI for Pandas DataFrames | ||||||||||
Sklearn Pandas | 2,724 | 111 | 39 | 2 days ago | 27 | May 08, 2021 | 38 | other | Python | |
Pandas integration with sklearn |
tabula-py
is a simple Python wrapper of tabula-java, which can read tables in a PDF.
You can read tables from a PDF and convert them into a pandas DataFrame. tabula-py also enables you to convert a PDF file into a CSV, a TSV or a JSON file.
You can see the example notebook and try it on Google Colab, or we highly recommend reading our documentation, especially the FAQ section.
I confirmed working on macOS and Ubuntu. But some people confirm it works on Windows 10. See also the documentation for the detailed installation for Windows 10.
Ensure you have a Java runtime and set the PATH for it.
pip install tabula-py
tabula-py enables you to extract tables from a PDF into a DataFrame, or a JSON. It can also extract tables from a PDF and save the file as a CSV, a TSV, or a JSON.
import tabula
# Read pdf into list of DataFrame
dfs = tabula.read_pdf("test.pdf", pages='all')
# Read remote pdf into list of DataFrame
dfs2 = tabula.read_pdf("https://github.com/tabulapdf/tabula-java/raw/master/src/test/resources/technology/tabula/arabic.pdf")
# convert PDF into CSV file
tabula.convert_into("test.pdf", "output.csv", output_format="csv", pages='all')
# convert all PDFs in a directory
tabula.convert_into_by_batch("input_directory", output_format='csv', pages='all')
See an example notebook for more details. I also recommend reading the tutorial article written by @aegis4048, and another tutorial written by @tdpetrou.
Interested in helping out? I'd love to have your help!
You can help by:
tabula-py
to people who might be able to benefit from using it.You can also support our continued work on tabula-py
with a donation on GitHub Sponsors or Patreon.