Feel free to contact me on Twitter at @econpotter or via email at .

What is an R Package?

Why Create a Package? Is it Worth It?

An Origin Story for {rnassqs}

Long long ago in a lab far far away (well, a few years ago in a building a block away), a bright-eyed grad student struggled to develop an R shiny app that pulled data from the USDA-NASS QuickStats database

For a long while I just had a bunch of functions in an R file, but:

  • What if I want to use in another project, and then I make changes to the original?
  • How do I test the functions to make sure they work when I change them? I thought of testing as something you do once, but the real benefit for personal code is ensuring that you don’t introduce an error when you make changes.

I ended up creating a package, getting it published on CRAN, and ultimately accepted by rOpenSci.

Trade-offs to creating a package:

  • Time investment to learn and create package (relatively small).
  • Testing framework to minimize chance of errors.
  • Centralizes code to be used in other projects.
  • Improves programming skills.

Trade-offs to Public Release:

  • Helps the author with networking, establishing expertise, etc…
  • Making a package CRAN/production-ready is costly (large time investment).
  • Additional thought about code structure improves programming skills.
  • Review by an organization such as rOpenSci is time consuming but can vastly improve code and is a great learning opportunity.
  • Word of warning: in some cases, other people may be developing a very similar package without your knowledge, so don’t sit on it!
  • Special considerations as a grad student (and possibly new faculty):
    • A published R package can get you a job.
    • In some cases, it can lead to a (small) publication.

I highly recommend reading Hadley Wickham’s thoughts: https://r-pkgs.org/workflows101.html.

Creating an R Package

Resources:

What’s in a name?

I named my package rnassqs because it allows the use of R to access the NASS [Q]uick[S]tats data. I’ve named a package that estimates the variance-covariance matrix for spatial data using Conley (1999) vcovConley, and a package that implements some statistics from Lewbel et al. (2020) lewbelr. My personal package is named napr. Karl Broman’s personal R package is named broman.

Names can always be changed later, so don’t worry about this too much to start. Before public release it’s worth thinking more carefully about this. See https://r-pkgs.org/workflows101.html#naming

A check on names can be done with:

available::available("napr")

Okay it’s go time!

# An R package best exists inside of an Rstudio project directory.
# {devtools} is an R package useful for creating packages.

# Use your own package name here
# Some package name best practices: https://r-pkgs.org/workflows101.html#naming
devtools::create("napr")

An R package has to have the following anatomy (open each of these!):

  • DESCRIPTION: metadata for the package
  • NAMESPACE: A description of the functions that your package makes available
  • R: Where your R code lives

Some (not all!) additional relevant directories:

  • data: data to be included in the package
  • data-raw: scripts (and raw data if necessary) that create the data
  • inst: vignettes, examples, and other things that must be included (can include test data).
  • tests: package testing functions
  • vignettes: where vignettes are stored

Example: rnassqs

Next Steps: {usethis}

{usethis} is a package full of infrastructure assistance functions.

Git for revision tracking

We want to track changes that we make to the code and allow for the infrastructure that github provides, so

usethis::use_git()

Data directories:

usethis::use_data() # creates the data directory
usethis::use_data_raw() # creates the data_raw directory

Choosing a license

### Choose a license. Opinions differ, but if you want people to use your code,
# you should choose something like the MIT license, which allows free use and
# ensures your package can be on CRAN.
#
# If instead this is a package to provide data, you may want to use the CC0
# license if there are no restrictions.
#
# More information: https://r-pkgs.org/license.html

# First open the DESCRIPTION file...
# What changes after you run this?!
usethis::use_mit_license()

Our First Function and Testing

Functions are stored in the R directory. You can either create an R script and save it directly, or use usethis to create it (and it’s tests) for you.

The question of how many functions to put in an R file is a difficult one with no clear guide. For simple projects, a separate file for each function makes sense. Here are the functions in one of the rnassqs files: https://github.com/ropensci/rnassqs/blob/master/R/request.R

# First let's set up testing:
usethis::use_testthat()
usethis::use_test("interpolate")

# Now create our R script "./R/interpolate.R"
usethis::use_r("interpolate")

We’ll start by writing a test in the test-interpolate.R file:

test_that("interpolate is correct", {
  t0 <- 0
  t1 <- 10
  
  # Linear interpolation
  y <- c(1:12*(t1 - t0)/12, 11:0*(t1 - t0)/12)
  x <- interpolate(tmin = t0, tmax = t1, type = "linear")
  expect_equal(x, y)
  
  # Sine interpolation should throw and error
  expect_error(interpolate(tmin = t0, tmax = t1, type = "sine"),
               "sine interpolation is not yet implemented.")

})

Now let’s edit the interpolate.R file to actually create the function:

#' Interpolate daily min/max temperatures across the day.
#'
#' @param tmin daily minimum temperature.
#' @param tmax daily maximum temperature.
#' @param type type of interpolation.
#' @return a numerical array of values.
interpolate <- function(tmin, tmax, type = c("linear", "sine")) {
  type = match.arg(type)
  if(type == "linear") {
    res <- approx(c(0,12,24), c(tmin, tmax, tmin), xout = 1:24)$y
  } else if(type == "sine") {
    stop("sine interpolation is not yet implemented.")
  }
  res
}

We can run our tests with

devtools::load_all() # loads all the source in the package
devtools::test() # runs the tests and gives us some nice output

Documentation

When we created our interpolate function, we had some lines starting with #'. Those are converted into documentation, which we can view then view as we would with any other package. More on documentation here: https://r-pkgs.org/man.html. However, the r-pkgs book has not been updated to reflect the use of markdown syntax, so you’ll notice that it uses a LaTeX-adjacent syntax, e.g. “\code{x}” instead of “`x`”.

devtools::document() # Generates the documentation stored in 'man'

# View the documentation with:
?interpolate
help(interpolate)

Check and Install

Once we are ready to build and install our package (even if it’s in draft form), we can run checks and then install it using devtools:

devtools::check() # Look at the notes!

There may be some notes, in particular, this one:

> checking R code for possible problems ... NOTE
  interpolate: no visible global function definition for ‘approx’
  Undefined global functions or variables:
    approx
  Consider adding
    importFrom("stats", "approx")
  to your NAMESPACE file.

Current recommended practice is to explicitly call the dependent package, and to add that package to the Imports list in the DESCRIPTION file:

# In DESCRIPTION:
...
Imports:
    stats
...

Edit R/interpolate.R to specify where approx comes from:

# replace this line:
res <- approx(c(0,12,24), c(tmin, tmax, tmin), xout = 1:24)$y

# with this:
res <- stats::approx(c(0,12,24), c(tmin, tmax, tmin), xout = 1:24)$y

Making the package available and setting up github

If you want people to be able to install the package with something simple like:

devtools::install_github/<your username>/<your package>

You can set up Rstudio to play nicely with github. Reference: https://r-pkgs.org/git.html for guidance

Code coverage and automated testing

You can automate testing of your package using github actions or other services. usethis will create the infrastructure for you:

# Most current is to use github actions:
usethis::use_github_actions()
usethis::use_github_actions_badge()

# But you may prefer appveyor or travis-ci
usethis::use_appveyor()
usethis::use_travis()

Here is guidance on setting up travis: https://r-pkgs.org/r-cmd-check.html#travis.