Awesome Open Source
Awesome Open Source

PyRefine

Documentation Status Updates

OpenRefine is a great tool for exploring and cleaning datasets prior to analysing them. It also records an undo history of all actions that you can export as a sort of script in JSON format. However, in order to execute that script on a new dataset, you need to manually import it through the graphical interface or set up a BatchRefine server, neither of which is quick.

PyRefine allows you to execute OpenRefine JSON scripts against datasets without firing up a full Java/OpenRefine server. It has a commandline tool for quick use, or you can use it as a library to integrate it into your pandas-based data analysis pipeline.

More details in this blog post.

Please note: PyRefine is still very much alpha-quality. It probably doesn't work exactly how you're expecting right now. That said, please try it out, and consider :doc:`contributing`!

Features

  • Execute OpenRefine JSON against a dataset from the command line
  • Execute OpenRefine JSON from a Python script

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Related Awesome Lists
Top Programming Languages
Top Projects

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
Python (795,861
Java (385,344
Script (62,864
Json (39,370
Dataset (33,031
Data Science (9,965
Commandline Tool (399
Openrefine (51