Tuesday, May 7, 2013

tryCatch! Fetch! Roll over! Little R details.

One of the most effective ways of dealing with R's slowness, when the completion of a script is not time-critical, is to start it and walk away.  Most of us have other things going on[1] and it's often fine to come back for your results the later in the day or the week.  While it might be nice to recode an script to run faster, it's often a waste of time. 

One difficulty with this approach is that if you haven't thought through all of your corner cases in a long running script, then you might come back to a script which died five minutes after you left it[2].  This is exacerbated by R's treatment of types which is best described as "loosey goosey" and even errors which you trigger on line #1 will happily propagate, as bogus data, for at least as long as it takes you to leave your desk chair.

That said, catching errors in R is not too hard, especially if you're willing to use awkward-looking constructs like tryCatch.  When I initially read the documentation it wasn't clear to me what signals you can set handlers for, and what you were supposed to do with the expressions, then I wrote the following code based on another example[3]:


N <- 20="" 50="" br="" k="">xs <- data="rnorm(n=N*K)," matrix="" ncol="K)<br" nrow="N,">ys <- data="rnorm(n=N*K)," matrix="" ncol="K)<br" nrow="N,">
o <- br="" list="">for ( i in 1:K ) {
  tryCatch(
    expr = { 

      if (i==10) stop ("WOOT!") else
        o[[i]] <- br="" i="" lm="" xs="" ys="">    interrupt = function(ex) {
      cat("Interrupt!\n")
      print(ex)
    },
    warning = function(ex) {
      cat("Warning!\n")
      print(ex)
    },
    error = function(ex) {
      cat("Error!\n")
      print(ex)
    },
    finally = { cat("Ta-da!\n") }
  )
}


The expression itself is silly, it's running a series of linear models, but it fails when "i" is 10.  The output is placed into a list and looking at "o" you'll notice that all the models ran, except for the 10th one which returned a NULL instead.  There should be K-1 models with results and K "Ta-da!" messages.  The code in the "finally" argument runs whether an error is caught or not, so you can use it to close file or database handles, or otherwise clean up your mess.  Don't be too tidy because you might clean up evidence for why your code failed. 

The error/warning handlers are pretty self-explanatory---they can be triggered using the "stop"/"warning" commands---but the interrupt handler might not be so obvious.  Interrupts are most commonly sent on my Linux machine in response to a Cntrl-C, and on Windows in response to an Esc.  For debugging it can be nice to put code in there which summarizes the state of your program and presents it to you nicely. 

One surprising thing is that whatever you do in a tryCatch expression is not local---in the above example, the list "o" appears in the global environment.  I might have guessed that, but ?tryCatch says "‘tryCatch’ evaluates its expression argument in a context where the handlers provided in the ‘...’ argument are available." which made me think it had it's own environment like a function.

It won't save you from silly things like forgetting to save your results but it'll save you from some things.


[1] ...meetings to go to, classes to teach, intro sections to write, diapers to change.

[2] It's never happened to me, but I hear it's a common problem.

[3] http://www1.maths.lth.se/help/R/ExceptionHandlingInR/

No comments:

Post a Comment