In this tutorial we will build an R package, write unit tests for it (steps 1-7), push it in a GitHub repository and integrate Travis CI with it (step 8). In step 9 we build a drat repository to host this and other R packages, using GitHub as a Web Server and set up Travis CI to automatically push updates of our package into drat. In step 10 we make some improvements in the package. We finish with providing sources for R package development and other relevant topics.
devtools
, is well integrated with RStudio.devtools
, usethis
, testthat
and roxygen2
in order to build R packagestidyverse
, rlang
drat
Let’s jump right into it by building a minimal R package following Hilary Parker’s building steps. Afterwards, we can discuss when or why to build an R package along with some good practices. Most of the discussed topics below are taken from Hadley Wickham’s R packages.
Hilary is building a cat-themed package, but in order to not discourage dog persons, I will go with a different theme. Let’s make a package whose aim is to perform some basic statistics on blood metabolomics data. (In fact, other than the slightly grotesque namings, the package’s single demo utility has nothing specific to blood metabolomics.)
Let’s create the minimum amount of subdirectories your package needs.
# Navigate to the desired parent directory
setwd("parent_directory")
# Create the directory of your package with the minimum amount of subdirectories
# Its reasonable but not necessary to name the directory with the same name as
# the package
usethis::create_package("bloodstats")
# If you run the above in RStudio, you will likely get a new RStudio window open
# automatically, with the project name "bloodstats"
(Alternatively, from within RStudio
, you may perform the step above by going to File
-> New Project...
-> New Directory
-> R package
, type in the package name and location and click Create Project
. Notice, you may also add existing functions at this step.)
What the above function did was to create a directory called bloodstats/
, inside the parent_directory/
, and inside bloodstats/
two subdirectories,
R/
, that will soon contain the package source code andman/
, that will soon contain the package’s documentation.Finally, there are two files, DESCRIPTION
and NAMESPACE
. Go ahead and edit the DESCRIPTION
file with a short description of the package, your name and contact information, etc. This is the file where you’ll also be keeping track of your package versioning.
Here is an example of how I would have my first version of the DESCRIPTION
look like:
Package: bloodstats
Title: Utilities for Metabolomics Data
Version: 0.0.0.9999
Authors@R:
person(
"Maria", "Kalimeri", email = "maria.kalimeri@nightingalehealth.com",
role = c("aut", "cre")
)
Description: Functions and other utilities for basic statistics on blood
metabolomics data.
Depends:
R (>= 3.5.0)
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.1.1
The package’s namespace, as recorded in the NAMESPACE
file, is something you should understand if you plan to share your packages. Namespace takes care of imports and exports such that your package will coexist in harmony with other packages. The file NAMESPACE
is something you shouldn’t edit by hand, instead roxygen2
will take care of updating this file everytime you build your documentation.
An important note concerning the name of your package, especially if you plan to share it with others. It’s good to make sure that the name of your package is not already in use by another CRAN package. You can check this by loading https://cran.r-project.org/web/packages/bloodstats
.
The License field can be either a standard abbreviation for an open source license, like GPL-2 or BSD, or a pointer to a file containing more information, file LICENSE. The license is really only important if you’re planning on releasing your package. If you don’t, you can ignore this section. I have added an MIT license here just for demonstrational purposes.
Below is an example of a function that fits the scope of our package. It takes a data frame as input (supposedly containining blood biomarkers) and returns the mean value of each column (variable) as long as this is numeric.
bloodmeans <- function(df) {
df %>%
dplyr::summarise_if(is.numeric, mean, na.rm = TRUE)
}
Save this function as bloodmeans.R
inside the R
subdirectory.
Good to know
Note the usage of the pipe
%>%
symbol above. The pipe is a way to write a series of operations on an R object, e.g. a data frame, in an easy-to-read way. As an example, the operationx %>% f(y)
effectivly meansf(x, y)
. You can read more on the pipe here.Furthermore,
summarize_if
is a function of thedplyr
package, that is part of the core tidyverse, an opinionated collection of R packages designed for data science. If you are not a tidyverse user, I strongly suggest to give it a try. A good place to start is H. Wickham’s book “R for Data Science”
What you need to do is type each function’s description and other comments at the beginning of each function in the form of special comments and roxygen2
will take care of building the whole documentation. An example of how the special comments that constitute object documentation should be is shown below. For more information on the subject see here.
#' Extract Mean Values of Blood Biomarkers
#'
#' This function accepts a dataframe as input and extracts the mean value of
#' each numeric variable.
#'
#' @param df a \code{data.frame} with at least one numeric variable in order to
#' get a non-empty result.
#' @return a data.frame with the mean values of each numeric
#' @importFrom dplyr summarise_if
#' @author John Doe
#' @export
#' @examples
#' library(magrittr)
#' data.frame(x1 = c(1,2,3), x2 = c(4,5,6)) %>%
#' bloodstats::bloodmeans()
bloodmeans <- function(df) {
df %>%
dplyr::summarise_if(is.numeric, mean, na.rm = TRUE)
}
It is not necessary to have each function in its own file - although it usually makes the code easier to read/access by others - but when you add more than one function in a file make sure to add the documentation for each function just before its definition.
Some notes on documentation. I find it always very usefull to have at least one example per function. Even for a “quick and dirty” package, working examples are sometimes saving the day. Especially if you need to demonstrate the structure of the function’s input parameter(s). Such a thing may need one or two built-in datasets. We will add a demo dataset, a couple of steps later.
Let’s take a moment and look at the package’s NAMESPACE
now. In the above code notice two things, first the explicit call dplyr::summarise_if
and second, at the documentation chunck, the entry:
#' @importFrom magrittr %>%
They both make sure that you package will have all needed imports for this function to work, i.e. packages dplyr
and magrittr
in this case. At this point we need to add these two dependencies in the file DESCRIPTION
but let’s not do it yet. Simply for demo purposes, we will let devtools::check()
pick up this error a bit later on.
You can now use devtools::document()
to build your documentation. From within the package directory, type the following:
# If you are using an RStudio project for your package development you are most
# likely already in the package directory. If not, navigate into it
# > setwd("./bloodstats")
# and type:
devtools::document()
This function is a wrapper for roxygen2::roxygenize()
; it adds .Rd
to the man
directory, one for each object in your package, assuming you have written comments as suggested in step 3. The function will also update the NAMESPACE
file of the main directory with the corresponding imports and exports.
If you see the following warning:
Warning: The existing 'NAMESPACE' file was not generated by roxygen2,`and will not be overwritten.
go ahead and remove the fileNAMESPACE
from the root directory. After you do so, re-run thedevtools::document()
command above.
devtools::check()
or R CMD check
will check your code for common issues like documentation mismatches, missing imports etc, including pass or fail of unit tests if such exist.
You should probably run checks quite often. This will help to start curing problems and incosistencies as soon as they appear rather than having to deal with a huge amount of them at a much later stage.
So run the command below
devtools::check()
or use the RStudio build-in shortcuts if you prefer.
Unless you added the dplyr
and magrittr
dependencies in your DESCRIPTION
above, the check command should now throw an error at the “checking package dependencies” stage. To fix this open DESCRIPTION
and update it as shown below:
Package: bloodstats
Title: Utilities for Metabolomics Data
Version: 0.0.1
Authors@R:
person(
"Maria", "Kalimeri", email = "maria.kalimeri@nightingalehealth.com",
role = c("aut", "cre")
)
Description: Functions and other utilities for basic statistics on blood
metabolomics data.
Depends:
R (>= 3.5.0)
Imports:
dplyr,
magrittr
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.1.1
After running checks again, there should still be a warning
❯ checking DESCRIPTION meta-information ... WARNING
Invalid license file pointers: LICENSE
which happens as there is a pointer to a file LICENSE that just doesn’t exist. The MIT license is a ‘template’, so if you use it, you need License: MIT + file LICENSE
, and a LICENSE file that looks like this:
YEAR: <Year or years when changes have been made>
COPYRIGHT HOLDER: <Name of the copyright holder>
You can add these lines in a file called LICENSE
in the package root and run devtools::check()
again.
Notice in the
DESCRIPTION
above that I now increased the version of our package from the development version 0.0.0.9999 to 0.0.1
This is an important part of package development. The main aims of writing formal tests is to make sure, that you will not break code that used to work, when you come back in the future to add features or improve existing code.
You can use usethis::use_testthat()
to set up the package to use tests. This command will do all the necessary steps below:
Suggests
field in the DESCRIPTION
. The Suggests
lines indicate that while your package can take advantage of a package, this is not required to make it work.The next step is to actually write the tests. We have only one function at the moment, bloodmeans()
. Create an R file with the name test-bloodmeans.R
, save it in subdir ./tests/testthat/
and type in the following contents.
context("bloodmeans")
library(magrittr)
res <-
data.frame(var1 = c(1, 2, 3), var2 = c(4, 5, 6)) %>%
bloodstats::bloodmeans()
test_that("bloodmeans returns output of expected class", {
expect_true(
class(res) == "data.frame"
)
})
test_that("bloodmeans returns expected result given input", {
expect_true(
all(res == data.frame(var1 = 2, var2 = 5))
)
})
You may now run the tests:
devtools::test()
Or you may use the RStudio build-in shortcuts.
Note that as soon as you add tests in your package, devtools::check()
will also include them in the check step.
Refer to the related section in R-packages:tests for more info on proper unit testing.
From the root directory of the bloodstats
folder, type the following.
devtools::install(".")
That will get your package installed in your machine. You can try viewing the documentation of your function by typing
?bloodmeans
‘There is a lot to learn on package development, but don’t feel overwhelmed. Start with a minimal subset of useful features (e.g. just an R/ directory!) and build up over time. To paraphrase the Zen monk Shunryu Suzuki: “Each package is perfect the way it is — and it can use a little improvement”.’
ggforestplot
, Nightingale’s first open source R package!! Website made with pkgdown