The goal of rrtools is to provide instructions, templates, and functions for making a basic compendium suitable for writing a reproducible journal article or report with R. This package documents the key steps and provides convenient functions for quickly creating a new research compendium. The approach is based generally on Kitzes et al. (2017), and more specifically on Marwick (2017), Marwick et al. (2018), and Wickham’s (2017) work using the R package structure as the basis for a research compendium.
rrtools provides a template for doing scholarly writing in a literate programming environment using R Markdown and bookdown. It also allows for isolation of your computational environment using Docker, package versioning using MRAN, and continuous integration using Travis. It makes a convenient starting point for writing a journal article or report. If you're writing a PhD thesis, or a similar type of multi-chapter document, a better choice might the huskydown package or other bookdown variants.
The functions in rrtools allow you to use R to easily follow the best practices outlined in several major scholarly publications on reproducible research. In addition to those cited above, Wilson et al. (2017), Piccolo & Frampton (2016), Stodden & Miguez (2014) and rOpenSci (2017a, b) are important sources that have influenced our approach to this package.
To explore and test rrtools without installing anything, click the Binder badge above to start RStudio in a browser tab that includes the contents of this GitHub repository. In that environment you can browse the files, install rrtools, and make a test compendium without altering anything on your computer.
You can install rrtools from GitHub with these lines of R code (Windows users are recommended to install a separate program, Rtools, before proceeding with this step):
if (!require("devtools")) install.packages("devtools")
devtools::install_github("benmarwick/rrtools")
To create a reproducible research compendium step-by-step using the rrtools approach, follow these detailed instructions. We use RStudio, and recommend it, but is not required for these steps to work. We recommend copy-pasting these directly into your console, and editing the options before running. We don’t recommend saving these lines in a script in your project: they are meant to be once-off setup functions.
pkgname
(you should use a different name, please follow the rules below) on that service. Then clone that repository to have a local empty directory on your computer, called pkgname
, that is linked to this remote repository. Please see our wiki for a step-by-step walk-though of this method, illustrated with screenshots.pkgname
on your computer, and initialize it with Git (git init
), then create a GitHub/GitLab repository and connect your local project to the remote repository.pkgname
must follow some rules for everything to work, it must:
rrtools::use_compendium("pkgname")
usethis::create_package()
to create a basic R package in the pkgname
directory, and then, if you’re using RStudio, opens the project. If you’re not using RStudio, it sets the working directory to the pkgname
directory.rrtools::use_compendium("path/to/pkgname")
(you use the path to pkgname
in your system)DESCRIPTION
file (located in your pkgname
directory) to include accurate metadata, e.g. your ORCID
Imports:
section of the DESCRIPTION
file with the names of packages used in the code we write in the Rmd document(s) by running rrtools::add_dependencies_to_description()
usethis::use_mit_license(copyright_holder = "My Name")
?usethis::use_mit_license()
rrtools::use_readme_rmd()
runtime.txt
that makes Binder work, if your compendium is hosted online (e.g. GitHub, Zenodo, Figshare, Dataverse, etc.)rrtools::use_analysis()
location =
options: top_level
to create a top-level analysis/
directory, inst
to create an inst/
directory (so that all the sub-directories are available after the package is installed), and vignettes
to create a vignettes/
directory (and automatically update the DESCRIPTION
). The default is a top-level analysis/
.analysis/
for example):analysis/
|
├── paper/
│ ├── paper.Rmd # this is the main document to edit
│ └── references.bib # this contains the reference list information
├── figures/ # location of the figures produced by the Rmd
|
├── data/
│ ├── raw_data/ # data obtained from elsewhere
│ └── derived_data/ # data generated during the analysis
|
└── templates
├── journal-of-archaeological-science.csl
| # this sets the style of citations & reference list
├── template.docx # used to style the output of the paper.Rmd
└── template.Rmd
paper.Rmd
is ready to write in and render with bookdown. It includes:
references.bib
file and the supplied csl
file (to style the reference list)references.bib
file has just one item to demonstrate the format. It is ready to insert more reference details.csl
file with a different citation style from https://github.com/citation-style-language/
Imports:
field in the DESCRIPTION
file must include the names of all packages used in analysis documents (e.g. paper.Rmd
). We have a helper function
rrtools::add_dependencies_to_description()
that will scan the Rmd file, identify libraries used in there, and add them to the DESCRIPTION
file.data_in_git =
argument, which is TRUE
by default. If set to FALSE
you will exclude files in the data/
directory from being tracked by git and prevent them from appearing on GitHub. You should set data_in_git = FALSE
if your data files are large (>100 mb is the limit for GitHub) or you do not want to make the data files publicly accessible on GitHub.
paper.Rmd
, you have a few options. You can write all your R code in chunks in the Rmd, that’s the simplest method. Or you can write R code in script files in /R
, and include devtools::load_all(".")
at the top of your paper.Rmd
. Or you can write functions in /R
and use library(pkgname)
at the top of your paper.Rmd
, or omit library
and preface each function call with pkgname::
. Up to you to choose whatever seems most natural to you.rrtools::use_dockerfile()
rocker/verse
as the base imagerocker/verse:3.5.0
)rocker/verse
includes R, the tidyverse, RStudio, pandoc and LaTeX, so compendium build times are very fast on travisrrtools::use_circleci()
to build our Docker container privately at https://circleci.com, from a private GitHub repo.rrtools::use_travis()
.travis.yml
file. By default it configures travis to build our Docker container from our Dockerfile, and build, install and run our custom package in this container. By specifying docker = FALSE
in this function, the travis file will not use Docker in travis, but run R directly on the travis infrastructure. We recommend using Docker because it offers greater computational isolation and saves a substantial amount of time during the travis build because the base image contains many pre-compiled packages.rrtools::use_circleci()
for running free private continuous integration tests at https://circleci.com, instead of travis. With rrtools::use_circleci(docker_hub = FALSE)
we can stop our Docker container from appearing on Docker Hub, so our Docker container stays completely private.usethis::use_testthat()
R/
, include tests to ensure they function as intendedtests/testthat/
and check http://r-pkgs.had.co.nz/tests.html for templateYou should be able to follow these steps to get a new research compendium repository connected to travis and ready to write in just a few minutes.
Kitzes, J., Turek, D., & Deniz, F. (Eds.). (2017). The Practice of Reproducible Research: Case Studies and Lessons from the Data-Intensive Sciences. Oakland, CA: University of California Press. https://www.practicereproducibleresearch.org
Marwick, B. (2017). Computational reproducibility in archaeological research: Basic principles and a case study of their implementation. Journal of Archaeological Method and Theory, 24(2), 424-450. https://doi.org/10.1007/s10816-015-9272-9
Marwick, B., Boettiger, C., & Mullen, L. (2018). Packaging data analytical work reproducibly using R (and friends). The American Statistician 72(1), 80-88. https://doi.org/10.1080/00031305.2017.1375986
Piccolo, S. R. and M. B. Frampton (2016). “Tools and techniques for computational reproducibility.” GigaScience 5(1): 30. https://gigascience.biomedcentral.com/articles/10.1186/s13742-016-0135-4
rOpenSci community (2017a). Reproducibility in Science A Guide to enhancing reproducibility in scientific results and writing. Online at http://ropensci.github.io/reproducibility-guide/
rOpenSci community (2017b). rrrpkg: Use of an R package to facilitate reproducible research. Online at https://awesomeopensource.com/project/ropensci/rrrpkg
Schmidt, S.C. and Marwick, B., 2020. Tool-Driven Revolutions in Archaeological Science. Journal of Computer Applications in Archaeology, 3(1), pp.18–32. DOI: http://doi.org/10.5334/jcaa.29
Stodden, V. & Miguez, S., (2014). Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research. Journal of Open Research Software. 2(1), p.e21. DOI: http://doi.org/10.5334/jors.ay
Wickham, H. (2017) Research compendia. Note prepared for the 2017 rOpenSci Unconf. https://docs.google.com/document/d/1LzZKS44y4OEJa4Azg5reGToNAZL0e0HSUwxamNY7E-Y/edit#
Wilson G, Bryan J, Cranston K, Kitzes J, Nederbragt L, et al. (2017). Good enough practices in scientific computing. PLOS Computational Biology 13(6): e1005510. https://doi.org/10.1371/journal.pcbi.1005510
If you would like to contribute to this project, please start by reading uur Guide to Contributing. Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
This project was developed during the 2017 Summer School on Reproducible Research in Landscape Archaeology at the Freie Universität Berlin (17-21 July), funded and jointly organized by Exc264 Topoi, CRC1266, and ISAAKiel. Special thanks to Sophie C. Schmidt for help. The convenience functions in this package are inspired by similar functions in the usethis
package.