Creating an R data package

A few weeks ago I got back into an old textbook of mine, Design and Analysis of Experiments (Dean and Voss). One of the things I love about this book is the incredible number of actual experiments used to demonstrate concepts. And all the data are available from the author’s website. My main interest in working through the book was to convert the many examples provided in SAS code into corresponding R code and to recreate the many plots. To do that meant having to download the data from the website and read it into R. Not that big of a deal, really, but I started thinking how nice it would be if the data were in an R package. Then I could just load the package and use the data at will. And I could document the data so I wouldn’t have to refer back to the book for variable definitions. And that’s what put me on the road to creating my first R data package, dvdata.

Now you’ll notice that last link went to GitHub, instead of CRAN. That’s because I asked one of the authors after I built the package if he would mind me uploading it to CRAN. Unfortunately, he did mind, because it turns out he’s working on his own R package for the next edition of the book. I was a little bummed, because I really wanted it on CRAN for the warm feeling of authenticity. But I understand. And besides the package still does what I wanted all along.

Now let’s talk about creating an R package. The very first thing you want to do is head over to Hadley Wickham’s R Packages site. He wrote a lovely book on how to create R Packages and posted it for free. And because it’s online, it almost always up-to-date. Hadley gently walks you through the process of creating an R package using his devtools package. I found it very easy to follow and I can’t recommend it enough.

What I want to do in this post is document in one place the basic steps to creating a R data package. All of these steps are in Hadley’s book, but they’re a little spread out due to the structure of the book, and because he covers a lot more than just making a simple data package.

Before you start, follow the directions under Getting Started in the Intro to Hadley’s book.

Steps to making an R data package

1. come up with a name for your package and create a package in RStudio as described here. This creates the smallest usable package. Let’s say you named your package “grrr”. On your computer you now have a directory called “grrr” which contains your package folders and files.

2. create two new directories: “data” and “data-raw” in your package directory.

3. go to the “R” directory in your package and delete the “hello.R” file.

4. Start a new R script called, perhaps, “get_data.R” and save to the “raw-data” directory. This will be the R script that reads in your data, wrangles it into shape and saves the data as an .RData file. You need to save the .RData objects into the “data” directory. The .RData objects are the data frames (or lists, or matrices, or vectors) your package will allow you to easily load. For examples, see Hadley’s R scripts in the “data-raw” directory of his babynames package.

5. When your data objects are done (i.e., the .RData files in your “data” directory) start an R script called “data.R” in the “R” directory. This is where you will compose the documentation for your data objects. Follow the directions in this section of Hadley’s book.

6. As you write your documentation, follow the The Documentation Workflow Hadley outlines in this section. Basically this involves submitting devtools::document() from the console and then previewing the documentation. Each time you submit devtools::document() Rd files are generated in the “man” directory of your package. (If you didn’t have a “man” directory, devtools creates one for you.) Do this until you are satisfied with your documentation.

7. Update the DESCRIPTION file as explained in this section. DESCRIPTION is just a text file.

8. Add ^data-raw$ to .Rbuildignore file. It too is just a text file. That keeps the “data-raw” folder from being included when the package is built.

9. Build the package: Ctrl + Shift + B. Feel free to do this at any point as you’re working on your package.

That about does it! If you want to submit to CRAN, then read Hadley’s Release chapter very closely and follow it to the T.

After creating dvdata, I created another data package called valottery that contains historical results of Virginia lottery drawings. This one I did get uploaded to CRAN.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.