Monthly Archives: December 2014

Review of R Graphs Cookbook

I was recently invited to review the second edition of the R Graphs Cookbook by Jaynal Abedin and Hrishi V. Mittal (Packt Publishing). This is not to be confused with the R Graphics Cookbook by Winston Chang (O’Reilly Media). I own the latter and refer to it on a regular basis. I was curious to see how the R Graphs Cookbook would compare to it.

The book has 15 chapters. The first 8 cover how to do traditional graphs such as scatter plots, line graphs, histograms, box plots and the like along with extensive coverage of tweaking graphical parameters. Chapters 9 – 14 cover special topics like heat maps, geographical maps, lattice and ggplot2. The final chapter presents several recipes for finalizing graphs for publication.

Each “recipe” follows a template of first showing How to Do It followed by a How it Works section that walks you through the code. There’s also a There’s More section that offers extra tips or suggestions and finally a See Also section that refers you to similar recipes. The recipes frequently use sample data sets that need to be downloaded from the Packt Publishing site. Otherwise they tend to use data sets included with the base R datasets package or randomly generated data. Each recipe includes the graph that it produces in color, at least they do in the the PDF version I reviewed. (I don’t know if the paperback includes color. For the asking price of $46 on Amazon I hope it does.)

My overall impression of the book is positive. The recipes are practical, the explanations of code are clear, and there are several pointers to useful R packages. The chapters that really stood out to me (ie, chapters I could see myself using) were the coverage of graphical parameters in chapter 3, working with map and GIS data in chapter 10, and preparing graphs for publication in chapter 15.

But the book isn’t perfect. My biggest complaint is the book datasets. They’re in a folder structure that doesn’t always match the book’s contents. For example there is a health expenditure dataset in the Chapter 3 folder that is not used until Chapter 4. As another example, the recipe for choosing plot symbols and sizes asks us to load the “cityrain.csv” data “that we used in the first chapter.” But it’s not used in the first chapter, it’s used it in the second chapter. But in this case the dataset is actually in the Chapter 2 folder! I frequently found myself setting my working directory to use a chapter’s dataset only to find the dataset wasn’t there. All this could have been avoided by just lumping all data into a single folder. Or perhaps by making an R package that contains the book’s datasets as the author of the R Graphics Cookbook has done.

Another head-scratcher is the introduction to the grid package in the first section of Chapter 1. The section is titled “Base graphics using the default package”, yet grid is presented as the base graphics package. It’s not. The base R graphics package is the graphics package. The authors clearly know a great deal about creating R graphics, but I don’t understand their reasoning for presenting the grid package as the default package for graphs.

There are a few recipes that I think could be improved. The “Formatting log axes” recipe simply plots \( 10^1\) through \( 10^5\) with the log argument set to “y”. Why not use one of the book’s datasets and show a before and after graph to demonstrate how the axis changes with log formatting? For example:

metals <- read.csv("Chap 3/Data Files/metals.csv")
par(mfrow=c(1,2))
plot(Ba ~ Cu, data=metals, xlim=c(0,100), 
     main="Before y-axis\n log transformation")
plot(Ba ~ Cu, data=metals, xlim=c(0,100), log="y", 
     main="After y-axis\nlog transformation")
par(mfrow=c(1,1))

log_graph

The "Creating bar charts with vertical error bars" recipe in chapter 6 creates error bars by multiplying the plotted values by 0.95 and 1.05. Why not show how to plot actual standard error bars? In fact they conclude the recipe by saying "In practice, scaled estimated standard deviation values or other formal estimates of error would be used to draw error bars instead of a blanket percentage error as shown here." To their credit, the authors do show how to create conventional standard error bars later on in the ggplot2 chapter. At the very least it seems like the recipe in chapter 6 should have a See Also section that points readers to the ggplot2 recipe.

One other recipe I thought could be better was "Setting graph margins and dimensions". It tells you how to do it but doesn't actually demonstrate it. It would have been nice to see the effect of changing the various parameters. In fact I'm still not sure how the fin and pin par() arguments work. Of course the authors go on to say "it is better to use mar or mai" instead of fin and pin, which I suppose is nice since I know how mar and mai work. But then why mention fin and pin in the first place?

While I'm on the subject of setting graphics parameters I noticed the authors never explicitly explain how to restore initial par() values by saving the result of par() when making changes. For example,

oldpar <- par(col=4, lty=2)
  … plotting commands …
par(oldpar)

They do it once in chapter 1 when demonstrating trellis graphs but they don't explain what it's doing or why it's there. I really believe that should be a separate recipe in chapter 3, "Beyond the Basics – Adjusting Key Parameters".

Some of the recipes I really liked were "Setting fonts for annotations and titles", "Using margin labels instead of legends for multiple-line graphs" and "Showing the number of observations" in the axis labels of box plots. Those three are golden. The recipe for "Graph annotation with ggplot" is also quite useful. And I thoroughly enjoyed working through the "Data Visualization Using Lattice" chapter. I had never used Lattice before and found this to be an excellent tutorial.

As I mentioned earlier, I own and make regular use of O'Reilly's R Graphics Cookbook. How does Packt's R Graphs Cookbook compare? The main difference is ggplot2. The O'Reilly book is almost exclusively devoted to ggplot2. In fact if not for the Miscellaneous Graphs chapter near the end it could very easily be called the ggplot2 Cookbook. The R Graphs Cookbook on the other hand is rooted in base R graphics. ggplot2 gets introduced briefly in chapters 1 and 4 before getting its own treatment in chapter 12. If you're looking for a ggplot2 reference, O'Reilly's R Graphics Cookbook is hands-down the best choice. But if you want a reference for base R graphics, then the R Graphs Cookbook is the better of the two.

Setting up a keyboard shortcut for the dplyr chain operator

I finally reached the point where I was using dplyr enough to get annoyed with typing %>%. I’m guessing if I was using Linux and Emacs this would be a trivial problem to solve. But I use RStudio on Windows, so my solution is a little more involved. Here it is in case anyone is interested.

1. Go to http://www.autohotkey.com/, download AutoHotkey, and install. It’s open source.
2. Right click on your desktop and click New > AutoHotkey Script
3. Give it name like “chain”
4. Right click on the script you just created and click Edit Script
5. Leave the existing text in the script as is and enter the following at the bottom, which maps %>% to the keys Ctrl + Shift + . (period)

^+.::
SendRaw `%
SendRaw >
SendRaw `%
return

6. Save and close the file
7. Double-click on the file; it should now be running in your system tray
8. Go to RStudio and hit Ctrl + Shift + . (period) That should enter %>%

To make this happen every time you start your computer, move or copy the “chain.ahk” file to your Startup folder.

To learn more about all the things AutoHotkey can do, check out their documentation.

Credit where credit is due: Here is the Stack Overflow thread where I learned about Autokey, which led me to AutoHotkey. And here is a helpful exchange on the AutoHotkey forum that showed me how to get the script to work.

UPDATE: LOL, turns out there’s a hot-key in R studio for this: Ctrl + Shift + M.