Awesome Open Source
Awesome Open Source

Create a Slack Account with us Slack Status


Data Cleaning with OpenRefine for Ecologists Lesson for Data Carpentry


Current Maintainers

Past Maintainers

Coming soon...

OpenRefine Version

The current version has been tested with OpenRefine 3.5.2.

Data set notes

  • This data set is derived from The Portal Project Long-term desert ecology project data. This data file was downloaded and then modified specifically for use with OpenRefine.
    • Taxon names were put back into the file.
    • Globally Unique Identifiers (in the form of UUIDs) were added.
  • These modifications were made in order to illustrate some features of Open Refine.
    • Errors were added to the taxon names (scientificName field), to demonstrate OpenRefine's ability to find likely mis-entered data.
    • These errors can be found using clustering algorithms on the scientificName column, showing the power of the algorithms to find discrepancies quickly and making it simple to fix all issues found.


We welcome all contributions to improve the lesson! Maintainers will do their best to help you if you have any questions, concerns, or experience any difficulties along the way. We'd like to ask you to familiarize yourself with our Contribution Guide.

Please see the current list of issues for ideas for contributing to this repository. For making your contribution, we use the GitHub flow, which is nicely explained in the chapter Contributing to a Project in Pro Git by Scott Chacon.

Look for the tag good_first_issue. This indicates that the mantainers will welcome a pull request fixing this issue.

Related Awesome Lists
Top Programming Languages
Top Projects

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
Python (807,156
English (6,836
Cleaning (1,794
Data Management (498
Data Cleaning (423
Ecology (251
Carpentries (89
Openrefine (51
Data Carpentry (39