Showing posts with label R idioms. Show all posts
Showing posts with label R idioms. Show all posts

Tuesday, May 7, 2013

tryCatch! Fetch! Roll over! Little R details.

One of the most effective ways of dealing with R's slowness, when the completion of a script is not time-critical, is to start it and walk away.  Most of us have other things going on[1] and it's often fine to come back for your results the later in the day or the week.  While it might be nice to recode an script to run faster, it's often a waste of time. 

One difficulty with this approach is that if you haven't thought through all of your corner cases in a long running script, then you might come back to a script which died five minutes after you left it[2].  This is exacerbated by R's treatment of types which is best described as "loosey goosey" and even errors which you trigger on line #1 will happily propagate, as bogus data, for at least as long as it takes you to leave your desk chair.

That said, catching errors in R is not too hard, especially if you're willing to use awkward-looking constructs like tryCatch.  When I initially read the documentation it wasn't clear to me what signals you can set handlers for, and what you were supposed to do with the expressions, then I wrote the following code based on another example[3]:


N <- 20="" 50="" br="" k="">xs <- data="rnorm(n=N*K)," matrix="" ncol="K)<br" nrow="N,">ys <- data="rnorm(n=N*K)," matrix="" ncol="K)<br" nrow="N,">
o <- br="" list="">for ( i in 1:K ) {
  tryCatch(
    expr = { 

      if (i==10) stop ("WOOT!") else
        o[[i]] <- br="" i="" lm="" xs="" ys="">    interrupt = function(ex) {
      cat("Interrupt!\n")
      print(ex)
    },
    warning = function(ex) {
      cat("Warning!\n")
      print(ex)
    },
    error = function(ex) {
      cat("Error!\n")
      print(ex)
    },
    finally = { cat("Ta-da!\n") }
  )
}


The expression itself is silly, it's running a series of linear models, but it fails when "i" is 10.  The output is placed into a list and looking at "o" you'll notice that all the models ran, except for the 10th one which returned a NULL instead.  There should be K-1 models with results and K "Ta-da!" messages.  The code in the "finally" argument runs whether an error is caught or not, so you can use it to close file or database handles, or otherwise clean up your mess.  Don't be too tidy because you might clean up evidence for why your code failed. 

The error/warning handlers are pretty self-explanatory---they can be triggered using the "stop"/"warning" commands---but the interrupt handler might not be so obvious.  Interrupts are most commonly sent on my Linux machine in response to a Cntrl-C, and on Windows in response to an Esc.  For debugging it can be nice to put code in there which summarizes the state of your program and presents it to you nicely. 

One surprising thing is that whatever you do in a tryCatch expression is not local---in the above example, the list "o" appears in the global environment.  I might have guessed that, but ?tryCatch says "‘tryCatch’ evaluates its expression argument in a context where the handlers provided in the ‘...’ argument are available." which made me think it had it's own environment like a function.

It won't save you from silly things like forgetting to save your results but it'll save you from some things.


[1] ...meetings to go to, classes to teach, intro sections to write, diapers to change.

[2] It's never happened to me, but I hear it's a common problem.

[3] http://www1.maths.lth.se/help/R/ExceptionHandlingInR/

Friday, April 29, 2011

R farts: Get with it.

As a set up for some more complicated environmental manipulations, .GlobalEnv is a pre-defined reference to the global environment (in the R sense).

With is a function which constructs a local environment from a data argument, and lets you evaluate an expression within that environment.  It returns only the value of the evaluated expression. 

The combination of these two lets you construct data frames with derived objects without polluting the global environment with junk temporary variables.  For example:
df <- data.frame( x = runif(100) )
df <- with( data = df, expr = {
   y = 2*x+1
   z = 3*y^2+1
   df <- data.frame(x = x, y = y, z = z )
   return(df)
})
Even better, if you have some variables pre-calculated in the global environment, you can pull them in using the .GlobalEnv reference:

df <- data.frame( x = runif(100) )
df <- with( data = df, expr = {
   y = 2*x+1
   z =.GlobalEnv$oftenUsedFunction(y)
   df <- data.frame(x = x, y = y, z = z )
   return(df)
})
I probably have pages and pages of R scripts which are somewhat repetitive.  Just not repetitive enough to qualify for a function.  I'm still looking for how to organize them better, but this is a start.

Monday, November 30, 2009

R farts: lists

I've been thinking about doing an R idioms post on indexing, but I need to talk about data structures to do that first... and I just can't get it together yet, so here's an R fart:

### Start with a list
aList <- list()

### Lists in R can hold vectors, of _unrelated_ types, with names 
### attached.  The list we made above is empty, and R will tell you 
### so if you ask:
print(aList)
str(aList)

### Once a list is created, we can add named objects to it like so:
aList[['bob']] <- 3
aList$frank <- 'five'
aList['3'] <- 'ten'
aList['4'] <- list(5)

### Each of these is neatly slotted in after the other and, as 
### expected, the list then has a length of four:
length(aList)

### The names of the objects can be extracted
names(aList)

### Either names or numbers can be used for indexing,
### then each element of the list is printed in turn:
for ( i in names(aList) ) {
  print( aList[[i]] )
}

for ( i in seq_along(aList) ) {
  print( aList[[i]] )
}

### Weirdness!
### If you index with single brackets, you get the 
### elements of the list, each as a list of length 1.  
### There's a reason this happens
for ( i in names(aList) ) {
  print( aList[i] )
}


### You can add unnamed objects like so:
aList[[10]] <- 8

### Something funny happened:
print(aList)

### When elements are added to a list by numerical index, and 
### the intervening elements between 1 and N do not exist, 
### the intervening elements are filled with unnamed NULLs.

### Now something funny happens:
aList[['5']] <- list(5)

### To actually get at that unnamed five as a length 1 numeric 
### vector you have to say:
print(aList[['5']][[1]])

### What's the use?  We'll do that next.


Monday, November 23, 2009

R Idioms: Counting observations

In the context of a data frame holding records of observations, with columns indicating grouping variables and/or measurements, one often runs into the need to count the number of observations.  There is no easy function in R that I know of for this, but 'aggregate' serves well enough.

Say we have a data frame ("data") with observations of plant heights measured over time.  The column names are: "individualId", "height", "species", "variety", "height".  We could count the number of observations in each variety as:

aggregate( x = data[['height']], 
  by = data[c('species','variety')], 
  FUN = length )

The use of the 'length' function was non-obvious to me.  Note the use of single brackets in specifying the columns of "data" in the "by"argument, and the lack of a comma in that expression.  This is done because 'by' must be a list.  Specifying data[,c('species','variety')] does not (always?) work.

We can of course calculate group means or variances in the same fashions, all of which becomes relevant when running statistical models.