Writing a package in R the easy way

Recently I’ve had the opportunity to work a lot with R. It’s not my favorite language, and for many reasons I would hesitate using it as a primary language in larger projects, but it’s a great statistical tool and in my opinion it’s really hard to beat, as far as scriptable interactive environments goes. This was also the primary aim of the code I was writing – being scriptable, yet interactive. The code is now usable and is composed of a number of different objects and classes that make it easy to manipulate meteorological station data, as well as perform pretty fundamental sanity checks on the data.

The question is, of course, how to distribute this code after it’s been written. I’ve often seen people send scripts back and forth, but I tend to be too snobby to do that – especially when the use  case is quite clearly perfect for a package. After balking at the very intimidating R documentation on packaging, I’ve condensed down the details to the route for making a package in R that I personally find most elegant and convenient.


Write the code

Of course, a package wouldn’t be a package without writing code, so get down to it. I also see this as the perfect time to document the code, and I like keeping my code and documentation in one place, so I would suggest doing that as well. There’s a great package called roxygen2 which extracts markdown documentation from your comments, so if you document every function with that markdown, you’ll be very happy in the long run. Documenting your code in your code itself encourages you to keep your documentation up to date and makes it easier to produce the documentation itself as well, so if you’re not doing it yet, start doing it. Here’s a pretty basic set of roxygen markdown directives as an example (it goes right above the function’s definition, as shown in the example):

#' Find the column numbers in a dataframe that start with a base name.
#'
#' @param dataframe The dataframe to extract the column positions from.
#' @param column.base The base name that the columns should start with.
#' @export
#' @family aurai
#' @return A named vector containing the column numbers of the columns that match the
#'   base name. The vector's names correspond to the original column names.
FindColumnNumbers <- function(dataframe, column.basename) { ...

As you can see, the comments themselves are tagged as a way of signifying semantic relevance for roxygen2. Instead of “#” they’re “#'”.

The first line is a summary of the function itself.

The”@param” tag signifies an an argument for your function. The first word following the tag is the argument’s name, everything following it is a description of the argument itself.

The “@export” tag tells roxygen2 that this function should be exported as one of the package’s top-level functions.

The “@family” tag tells roxygen2 to list all other tags with the same family name in the “See Also” section of the generated help document, as well as listing the current function in the “See Also” section of all other functions in the same family.

The “@return” tag signifies that the following description is of the object the function returns.

There are other tags, so see the roxygen2 help for more details.

Structure the package

Now that your code is written and documented, you can structure your package. An R package is, at its heart, a folder named after the package with the following minimal structure:

DESCRIPTION
man/
|-- *.Rd
NAMESPACE
R/
|--*.R

The DESCRIPTION file contains specially formatted text describing the package. More information is available on R’s manual on writing extensions (although there’s a lot to read, you can almost just copy the example and change the content to match your needs).

The man/ folder contains R markdown files. You don’t need to worry about those, they will be autogenerated by roxygen2.

The NAMESPACE file is also generated by roxygen2. It contains a list of the functions from the package that are exported into your R session’s current namespace at the point that the library is loaded.

The R/ folder contains your R source files. More on that below.

In addition, you can include scripts or executables that use the package you  wrote in a folder called exec/.

Now that you’ve got the minimal package structure, start filling it up. Any R source files you’ve written – whether you’ve put all your functions in one big file or split it up into a single file for each function or something in between – go in the R/ folder. This should not be composed of scripts, but rather functions that can be used in code that loads your library. Any relevant scripts go in the exec/ folder. If you have anything else you wish to provide with your package – example data, source files for other languages, etc. – you can include those as well. See the R documentation on that for more details.

Generate the documentation

This isn’t hard, since you’ve already written the documentation. However, R wants to documentation documentation you make available with its “?” statement to be distributed with the package as R markdown files. Rather than making these on our own and having to maintain them separately from the code, we’ll generate them using roxygen2.

roxygen2 can generate the markdown files for us by parsing the comments in the code, if they’re written using the syntax described above, and putting the files in the doc folder. This can be done from within R as follows:

# Start an R session from the root folder of your package
dlee@localhost:~/development/eclipse/myPackage/my_package> R

R version 2.15.3 (2013-03-01) -- &quot;Security Blanket&quot;
Copyright (C) 2013 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-suse-linux-gnu (64-bit)

R ist freie Software und kommt OHNE JEGLICHE GARANTIE.
Sie sind eingeladen, es unter bestimmten Bedingungen weiter zu verbreiten.
Tippen Sie 'license()' or 'licence()' für Details dazu.

R ist ein Gemeinschaftsprojekt mit vielen Beitragenden.
Tippen Sie 'contributors()' für mehr Information und 'citation()',
um zu erfahren, wie R oder R packages in Publikationen zitiert werden können.

Tippen Sie 'demo()' für einige Demos, 'help()' für on-line Hilfe, oder
'help.start()' für eine HTML Browserschnittstelle zur Hilfe.
Tippen Sie 'q()', um R zu verlassen.

> library(roxygen2)
> roxygenize(".")
Writing FindColumnNumbers.Rd
...

Build the package and send it to your all your friends!

Apparently, if you haven’t found a package full of R sources on your doorstep, I do not consider you a friend. But you may be able to win me over with a package of your own 😉

R packages can be built from the command line with the following command:

R CMD build $MY_PACKAGE

This produces a package archive in the directory from which the command was issued. “$MY_PACKAGE” should, of course, be assigned or replaced with the path to the package’s root folder.

And that’s that! If you or somebody else wants to install the package, they can do so from R with the following command:

> install.packages("my_package_archive.tar.gz")

You can also consider uploading the package to CRAN. If you do so, remember to make sure that you meet CRAN’s requirements for packages, which this set of instructions does not guarantee.

Advertisements
About

My name’s Daniel Lee. I’m an enthusiast for open source and sharing. I grew up in the United States and did my doctorate in Germany. I've founded a company for planning solar power. I've worked on analog space suit interfaces, drones and a bunch of other things in my free time. I'm also involved in standards work for meteorological data. Now I work for the German Weather Service on improving forecasts for weather and renewable power production.

Tagged with: , ,
Posted in Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

From the archive