Tuesday, May 7, 2013

tryCatch! Fetch! Roll over! Little R details.

One of the most effective ways of dealing with R's slowness, when the completion of a script is not time-critical, is to start it and walk away.  Most of us have other things going on[1] and it's often fine to come back for your results the later in the day or the week.  While it might be nice to recode an script to run faster, it's often a waste of time. 

One difficulty with this approach is that if you haven't thought through all of your corner cases in a long running script, then you might come back to a script which died five minutes after you left it[2].  This is exacerbated by R's treatment of types which is best described as "loosey goosey" and even errors which you trigger on line #1 will happily propagate, as bogus data, for at least as long as it takes you to leave your desk chair.

That said, catching errors in R is not too hard, especially if you're willing to use awkward-looking constructs like tryCatch.  When I initially read the documentation it wasn't clear to me what signals you can set handlers for, and what you were supposed to do with the expressions, then I wrote the following code based on another example[3]:


N <- 20="" 50="" br="" k="">xs <- data="rnorm(n=N*K)," matrix="" ncol="K)<br" nrow="N,">ys <- data="rnorm(n=N*K)," matrix="" ncol="K)<br" nrow="N,">
o <- br="" list="">for ( i in 1:K ) {
  tryCatch(
    expr = { 

      if (i==10) stop ("WOOT!") else
        o[[i]] <- br="" i="" lm="" xs="" ys="">    interrupt = function(ex) {
      cat("Interrupt!\n")
      print(ex)
    },
    warning = function(ex) {
      cat("Warning!\n")
      print(ex)
    },
    error = function(ex) {
      cat("Error!\n")
      print(ex)
    },
    finally = { cat("Ta-da!\n") }
  )
}


The expression itself is silly, it's running a series of linear models, but it fails when "i" is 10.  The output is placed into a list and looking at "o" you'll notice that all the models ran, except for the 10th one which returned a NULL instead.  There should be K-1 models with results and K "Ta-da!" messages.  The code in the "finally" argument runs whether an error is caught or not, so you can use it to close file or database handles, or otherwise clean up your mess.  Don't be too tidy because you might clean up evidence for why your code failed. 

The error/warning handlers are pretty self-explanatory---they can be triggered using the "stop"/"warning" commands---but the interrupt handler might not be so obvious.  Interrupts are most commonly sent on my Linux machine in response to a Cntrl-C, and on Windows in response to an Esc.  For debugging it can be nice to put code in there which summarizes the state of your program and presents it to you nicely. 

One surprising thing is that whatever you do in a tryCatch expression is not local---in the above example, the list "o" appears in the global environment.  I might have guessed that, but ?tryCatch says "‘tryCatch’ evaluates its expression argument in a context where the handlers provided in the ‘...’ argument are available." which made me think it had it's own environment like a function.

It won't save you from silly things like forgetting to save your results but it'll save you from some things.


[1] ...meetings to go to, classes to teach, intro sections to write, diapers to change.

[2] It's never happened to me, but I hear it's a common problem.

[3] http://www1.maths.lth.se/help/R/ExceptionHandlingInR/

Friday, May 3, 2013

Filial cannibalism, i.e.- status, sacrifice and workplace relations in the academy

While waiting to see if anaphylaxis was in my near future during a routine doctor's visit, I overhead a conversation between two men.  Nominally it was a professional conversation about past jobs, present opportunities, and the difficulties of the job search.  In reality one was hoping that the other would help him acquire a job and the conversation had gotten a little awkward before the fully-employed gentleman caught on.  Mr. Employed soon started working to gently reduce expectations, but Mr. Hopeful was in a bind because he had to pretend he did not ask about work in the first place to maintain a sense of professional decorum.  Then the conversation generated this gem:

Mr. Employed: "... but we still have to find people to teach X".

Mr. Hopeful, interrupting: "Just hire adjuncts, they're cheap."

Well played Mr. Hopeful, well played.  Good luck with that job search.

Figure 1: Mr. Hopeful prepares for his final job interview and salary negotiations.



Friday, March 22, 2013

Bayesian Versus Frequentist

In 2008 I took a course from Michael Lavine called "Introduction to Statistical Thought".  Beyond delivering the material promised in the title, it also got me hooked on Bayesian inference by making the tools so readily accessible conceptually and technically.  One thing I did not get out of the class was a good understanding of how, in general, one goes about estimating things outside the Bayesian framework.  Cosma Shalizi just fixed that for me, and maybe he could for you too.  Somehow it's much easier for me to think about the technical requirements after seeing this sketch.  I am not sure why this is not part of Michael's class, but I can't repeat the class over and over just to find out if he skipped asymptotic estimation on purpose or by accident*.

* Michael's book (linked above) does have a new section on Asymptotics which was absent when I took the course!

Friday, March 15, 2013

Rcpp in NYC

Yesterday Dirk Eddelbuettel graced NYC with his presence to give a full-day workshop on Rcpp.  It was an epic in four parts (I, II, III, and IV).  I'm in the middle of putting some MCMC algorithms for mark-recapture movement data into a C++ library and writing the R interface so I thought it would be worth the effort to get down there and drink from the source of Rcpp knowledge. The source turned out to be a fire-hose, but Jared Lander plied us with bagels and Indian food (the Indian was my suggestion, but not as good as I remembered it) which helped with the endurance aspect. 

Interestingly, the overall content of the workshop was oriented toward people who had not committed much time to trying to figure out Rcpp and its associated tools on their own. Personally I would not spend that kind of money on a workshop if I weren't prepared to get the best return out of it, but hey, it's NYC. You can find $500 on the street*. Anyway, even if you walked in knowing only that you needed to interface R to a C++ library or just some small fast C++ function, you would have walked out happy.
 Some of the materials were old news to me but I did bring a long list of questions and Dirk tolerated them quite well.

The ones I asked were:

Monday, February 25, 2013

Mediterranean Diet

This is pretty cool on the face of it.  I love it when people really go for it and demonstrating/measure an effect which is both complex and hard to define.  Also worth a look is the effort they had to go through to get a dissenting voice:
Dr. Esselstyn said those in the Mediterranean diet study still had heart attacks and strokes. So, he said, all the study showed was that “the Mediterranean diet and the horrible control diet were able to create disease in people who otherwise did not have it.”
Yes... 'cause if you followed Dr. Esselstyn's special diet you'd never get a heart attack... just buy the book!  I don't know whether Dr. Esselstyn produced any more coherent statements on but this topic, but if not then the NYT is really stretching to tell us "both sides of the story". 


Can't they just call a slam dunk a slam dunk?  Or maybe read the paper and discuss some real flaws?  Also, did the nuts drive people nuts?  They didn't die of heart disease, but maybe they got into extreme sports?

Wednesday, January 23, 2013

Read your error messages

So for a conservation-oriented project I'm working on I wrote an R package to implement arbitrary-stage and currently 1-D integral projection models (IPM's)[1].  It lets you define and simulate from an IPM using a few data frames and R's formula interface to define the projection kernels[2].  It's been a great time-saver for me since I finished up most of it... but I'm still very suspicious of the code quality so when I got an error message I immediately blamed the package code...


> pop$step()
Error in model_matrix(newdata, covariates, n) :
  Model matrix columns for some coefficients missing:
        Missing columns for:
                sizes


... of course I had just happened include a coefficient which I did not use in the R formula for the model matrix ( ~ 1, where ~ 1 + sizes would have worked)... so I get to learn, once again, the age-old dictum of writing code: _read_ the error messages.  Really.  READ them. Especially if you went through the trouble of writing good ones.

[1] Ellner and Rees 2006 being the best starting reference for those....
[2] It'll be on CRAN someday...

Monday, September 17, 2012

Wherein Reuters reports that the rainforest is wet. Also riots and a certain prophet.

Once somebody "offends Islam"[1], even in the gentlest literary way possible, the ensuing pearl-cluthcing, apologies, riots, embassy attacks, further apologies, and murders are almost entirely predicable.  It's just a question of how bad it's going to get and how long the show will go on in the international news. 

Also predictable is the slew of "news" articles which no insight as to why a mere cartoon or book would lead to murder and mayhem on an international scale.  Instead of insight we get dreck like this:
"The demonstrations were the latest across the world provoked by a short film made with private funds in the United States that depicted the Prophet Mohammad as a fool and womanizer."
Somehow I don't think that people were sitting around having tea and listening to the news and when they heard about the movie they decided it would be a great time to go storm an embassy.  Apparently with kids in front (Reuters):
Figure 1: Love Islam, will throw stones for same.
Honestly, Reuters, are we supposed to believe that this school field trip (see fig. 1) was the result of a film?  ... or did somebody maybe work a little to convince these folks that throwing rocks and burning things that were vaguely Western was a good idea?  If you can't write news, maybe you could just give up and leave it to the bloggers who are honestly just as good at reading the press release from the Pakistani embassy.

[1] People can be offended, ideas... not so much.