Feel free to contact me on Twitter at @econpotter or via email at econpotter@gmail.com.
Long long ago in a lab far far away (well, a few years ago in a building a block away), a bright-eyed grad student struggled to develop an R shiny app that pulled data from the USDA-NASS QuickStats database…
For a long while I just had a bunch of functions in an R file, but:
I ended up creating a package, getting it published on CRAN, and ultimately accepted by rOpenSci.
Trade-offs to creating a package:
Trade-offs to Public Release:
I highly recommend reading Hadley Wickham’s thoughts: https://r-pkgs.org/workflows101.html.
Resources:
I named my package rnassqs
because it allows the use of R
to access the NASS [Q]uick[S]tats data. I’ve named a package that estimates the variance-covariance matrix for spatial data using Conley (1999) vcovConley
, and a package that implements some statistics from Lewbel et al. (2020) lewbelr
. My personal package is named napr
. Karl Broman’s personal R package is named broman
.
Names can always be changed later, so don’t worry about this too much to start. Before public release it’s worth thinking more carefully about this. See https://r-pkgs.org/workflows101.html#naming
A check on names can be done with:
available::available("napr")
Okay it’s go time!
# An R package best exists inside of an Rstudio project directory.
# {devtools} is an R package useful for creating packages.
# Use your own package name here
# Some package name best practices: https://r-pkgs.org/workflows101.html#naming
devtools::create("napr")
An R package has to have the following anatomy (open each of these!):
Some (not all!) additional relevant directories:
Example: rnassqs
{usethis} is a package full of infrastructure assistance functions.
We want to track changes that we make to the code and allow for the infrastructure that github provides, so
usethis::use_git()
usethis::use_data() # creates the data directory
usethis::use_data_raw() # creates the data_raw directory
### Choose a license. Opinions differ, but if you want people to use your code,
# you should choose something like the MIT license, which allows free use and
# ensures your package can be on CRAN.
#
# If instead this is a package to provide data, you may want to use the CC0
# license if there are no restrictions.
#
# More information: https://r-pkgs.org/license.html
# First open the DESCRIPTION file...
# What changes after you run this?!
usethis::use_mit_license()
Functions are stored in the R
directory. You can either create an R script and save it directly, or use usethis
to create it (and it’s tests) for you.
The question of how many functions to put in an R
file is a difficult one with no clear guide. For simple projects, a separate file for each function makes sense. Here are the functions in one of the rnassqs
files: https://github.com/ropensci/rnassqs/blob/master/R/request.R
# First let's set up testing:
usethis::use_testthat()
usethis::use_test("interpolate")
# Now create our R script "./R/interpolate.R"
usethis::use_r("interpolate")
We’ll start by writing a test in the test-interpolate.R
file:
test_that("interpolate is correct", {
t0 <- 0
t1 <- 10
# Linear interpolation
y <- c(1:12*(t1 - t0)/12, 11:0*(t1 - t0)/12)
x <- interpolate(tmin = t0, tmax = t1, type = "linear")
expect_equal(x, y)
# Sine interpolation should throw and error
expect_error(interpolate(tmin = t0, tmax = t1, type = "sine"),
"sine interpolation is not yet implemented.")
})
Now let’s edit the interpolate.R
file to actually create the function:
#' Interpolate daily min/max temperatures across the day.
#'
#' @param tmin daily minimum temperature.
#' @param tmax daily maximum temperature.
#' @param type type of interpolation.
#' @return a numerical array of values.
interpolate <- function(tmin, tmax, type = c("linear", "sine")) {
type = match.arg(type)
if(type == "linear") {
res <- approx(c(0,12,24), c(tmin, tmax, tmin), xout = 1:24)$y
} else if(type == "sine") {
stop("sine interpolation is not yet implemented.")
}
res
}
We can run our tests with
devtools::load_all() # loads all the source in the package
devtools::test() # runs the tests and gives us some nice output
When we created our interpolate function, we had some lines starting with #'
. Those are converted into documentation, which we can view then view as we would with any other package. More on documentation here: https://r-pkgs.org/man.html. However, the r-pkgs book has not been updated to reflect the use of markdown syntax, so you’ll notice that it uses a LaTeX-adjacent syntax, e.g. “\code{x}
” instead of “`x`”.
devtools::document() # Generates the documentation stored in 'man'
# View the documentation with:
?interpolate
help(interpolate)
Once we are ready to build and install our package (even if it’s in draft form), we can run checks and then install it using devtools
:
devtools::check() # Look at the notes!
There may be some notes, in particular, this one:
> checking R code for possible problems ... NOTE
interpolate: no visible global function definition for ‘approx’
Undefined global functions or variables:
approx
Consider adding
importFrom("stats", "approx")
to your NAMESPACE file.
Current recommended practice is to explicitly call the dependent package, and to add that package to the Imports
list in the DESCRIPTION file:
# In DESCRIPTION:
...
Imports:
stats
...
Edit R/interpolate.R
to specify where approx
comes from:
# replace this line:
res <- approx(c(0,12,24), c(tmin, tmax, tmin), xout = 1:24)$y
# with this:
res <- stats::approx(c(0,12,24), c(tmin, tmax, tmin), xout = 1:24)$y
If you want people to be able to install the package with something simple like:
devtools::install_github/<your username>/<your package>
You can set up Rstudio to play nicely with github. Reference: https://r-pkgs.org/git.html for guidance
You can automate testing of your package using github actions or other services. usethis
will create the infrastructure for you:
# Most current is to use github actions:
usethis::use_github_actions()
usethis::use_github_actions_badge()
# But you may prefer appveyor or travis-ci
usethis::use_appveyor()
usethis::use_travis()
Here is guidance on setting up travis: https://r-pkgs.org/r-cmd-check.html#travis.