Tuesday, October 15, 2013

The transcriptome of BRAAAAIIINNNZZZZZ!!!11!

Molecular ecology has a neat, "it's-obvious-once-somebody-does-it" article at the intersection of behavioural ecology and molecular biology by Rey[1] and Boltana.  They took a bunch of zebrafish and separated them into groups (P and R, not sure why).  Then they confirmed their classification by repeating the experiment a while later and applied a series of standard measures of behavioural type (boldness and activity, mostly) and proceeded to mash/freeze fish brains and do some standard stuff to get transcribed mRNA[2] and a transcriptome-style analysis.  There's a lot of FDR-adjusted p-values thrown around about which things were up- or down-regulated, but the figure which highlights low/high expression differences between the groups tells you (almost) everything you need to know.

Rey and Boltana's main finding is that by pre-classifying individuals according to behavioural type they can predict some of the transcriptome variation (9% is their favourite number) and therefore make that data easier to analyze.  For anyone interested in actual animal behaviour the key finding is that you can take pretty crude behavioural measures and expect them to be related to transcription in the brain.  This kind of work gives me some real hope for bringing together neuroscience and movement/behavioural ecology.

[1] I think I got the right character there...
[2] Bonus question to any intro bio students who find this, why the liquid nitrogen?
[3] I must have blogger set to non-US spellings, it keeps complaining about "behavior" and I just comply.

Thursday, June 13, 2013

A readable Boost Graph Library (BGL) tutorial!

The BGL documentation is a little abstract for me so I enjoyed finding this little gem.  Probably even if you don't know French the examples are quite good.

Tuesday, May 7, 2013

tryCatch! Fetch! Roll over! Little R details.

One of the most effective ways of dealing with R's slowness, when the completion of a script is not time-critical, is to start it and walk away.  Most of us have other things going on[1] and it's often fine to come back for your results the later in the day or the week.  While it might be nice to recode an script to run faster, it's often a waste of time. 

One difficulty with this approach is that if you haven't thought through all of your corner cases in a long running script, then you might come back to a script which died five minutes after you left it[2].  This is exacerbated by R's treatment of types which is best described as "loosey goosey" and even errors which you trigger on line #1 will happily propagate, as bogus data, for at least as long as it takes you to leave your desk chair.

That said, catching errors in R is not too hard, especially if you're willing to use awkward-looking constructs like tryCatch.  When I initially read the documentation it wasn't clear to me what signals you can set handlers for, and what you were supposed to do with the expressions, then I wrote the following code based on another example[3]:

N <- 20="" 50="" br="" k="">xs <- data="rnorm(n=N*K)," matrix="" ncol="K)<br" nrow="N,">ys <- data="rnorm(n=N*K)," matrix="" ncol="K)<br" nrow="N,">
o <- br="" list="">for ( i in 1:K ) {
    expr = { 

      if (i==10) stop ("WOOT!") else
        o[[i]] <- br="" i="" lm="" xs="" ys="">    interrupt = function(ex) {
    warning = function(ex) {
    error = function(ex) {
    finally = { cat("Ta-da!\n") }

The expression itself is silly, it's running a series of linear models, but it fails when "i" is 10.  The output is placed into a list and looking at "o" you'll notice that all the models ran, except for the 10th one which returned a NULL instead.  There should be K-1 models with results and K "Ta-da!" messages.  The code in the "finally" argument runs whether an error is caught or not, so you can use it to close file or database handles, or otherwise clean up your mess.  Don't be too tidy because you might clean up evidence for why your code failed. 

The error/warning handlers are pretty self-explanatory---they can be triggered using the "stop"/"warning" commands---but the interrupt handler might not be so obvious.  Interrupts are most commonly sent on my Linux machine in response to a Cntrl-C, and on Windows in response to an Esc.  For debugging it can be nice to put code in there which summarizes the state of your program and presents it to you nicely. 

One surprising thing is that whatever you do in a tryCatch expression is not local---in the above example, the list "o" appears in the global environment.  I might have guessed that, but ?tryCatch says "‘tryCatch’ evaluates its expression argument in a context where the handlers provided in the ‘...’ argument are available." which made me think it had it's own environment like a function.

It won't save you from silly things like forgetting to save your results but it'll save you from some things.

[1] ...meetings to go to, classes to teach, intro sections to write, diapers to change.

[2] It's never happened to me, but I hear it's a common problem.

[3] http://www1.maths.lth.se/help/R/ExceptionHandlingInR/

Friday, May 3, 2013

Filial cannibalism, i.e.- status, sacrifice and workplace relations in the academy

While waiting to see if anaphylaxis was in my near future during a routine doctor's visit, I overhead a conversation between two men.  Nominally it was a professional conversation about past jobs, present opportunities, and the difficulties of the job search.  In reality one was hoping that the other would help him acquire a job and the conversation had gotten a little awkward before the fully-employed gentleman caught on.  Mr. Employed soon started working to gently reduce expectations, but Mr. Hopeful was in a bind because he had to pretend he did not ask about work in the first place to maintain a sense of professional decorum.  Then the conversation generated this gem:

Mr. Employed: "... but we still have to find people to teach X".

Mr. Hopeful, interrupting: "Just hire adjuncts, they're cheap."

Well played Mr. Hopeful, well played.  Good luck with that job search.

Figure 1: Mr. Hopeful prepares for his final job interview and salary negotiations.

Friday, March 22, 2013

Bayesian Versus Frequentist

In 2008 I took a course from Michael Lavine called "Introduction to Statistical Thought".  Beyond delivering the material promised in the title, it also got me hooked on Bayesian inference by making the tools so readily accessible conceptually and technically.  One thing I did not get out of the class was a good understanding of how, in general, one goes about estimating things outside the Bayesian framework.  Cosma Shalizi just fixed that for me, and maybe he could for you too.  Somehow it's much easier for me to think about the technical requirements after seeing this sketch.  I am not sure why this is not part of Michael's class, but I can't repeat the class over and over just to find out if he skipped asymptotic estimation on purpose or by accident*.

* Michael's book (linked above) does have a new section on Asymptotics which was absent when I took the course!

Friday, March 15, 2013

Rcpp in NYC

Yesterday Dirk Eddelbuettel graced NYC with his presence to give a full-day workshop on Rcpp.  It was an epic in four parts (I, II, III, and IV).  I'm in the middle of putting some MCMC algorithms for mark-recapture movement data into a C++ library and writing the R interface so I thought it would be worth the effort to get down there and drink from the source of Rcpp knowledge. The source turned out to be a fire-hose, but Jared Lander plied us with bagels and Indian food (the Indian was my suggestion, but not as good as I remembered it) which helped with the endurance aspect. 

Interestingly, the overall content of the workshop was oriented toward people who had not committed much time to trying to figure out Rcpp and its associated tools on their own. Personally I would not spend that kind of money on a workshop if I weren't prepared to get the best return out of it, but hey, it's NYC. You can find $500 on the street*. Anyway, even if you walked in knowing only that you needed to interface R to a C++ library or just some small fast C++ function, you would have walked out happy.
 Some of the materials were old news to me but I did bring a long list of questions and Dirk tolerated them quite well.

The ones I asked were:

Monday, February 25, 2013

Mediterranean Diet

This is pretty cool on the face of it.  I love it when people really go for it and demonstrating/measure an effect which is both complex and hard to define.  Also worth a look is the effort they had to go through to get a dissenting voice:
Dr. Esselstyn said those in the Mediterranean diet study still had heart attacks and strokes. So, he said, all the study showed was that “the Mediterranean diet and the horrible control diet were able to create disease in people who otherwise did not have it.”
Yes... 'cause if you followed Dr. Esselstyn's special diet you'd never get a heart attack... just buy the book!  I don't know whether Dr. Esselstyn produced any more coherent statements on but this topic, but if not then the NYT is really stretching to tell us "both sides of the story". 

Can't they just call a slam dunk a slam dunk?  Or maybe read the paper and discuss some real flaws?  Also, did the nuts drive people nuts?  They didn't die of heart disease, but maybe they got into extreme sports?

Wednesday, January 23, 2013

Read your error messages

So for a conservation-oriented project I'm working on I wrote an R package to implement arbitrary-stage and currently 1-D integral projection models (IPM's)[1].  It lets you define and simulate from an IPM using a few data frames and R's formula interface to define the projection kernels[2].  It's been a great time-saver for me since I finished up most of it... but I'm still very suspicious of the code quality so when I got an error message I immediately blamed the package code...

> pop$step()
Error in model_matrix(newdata, covariates, n) :
  Model matrix columns for some coefficients missing:
        Missing columns for:

... of course I had just happened include a coefficient which I did not use in the R formula for the model matrix ( ~ 1, where ~ 1 + sizes would have worked)... so I get to learn, once again, the age-old dictum of writing code: _read_ the error messages.  Really.  READ them. Especially if you went through the trouble of writing good ones.

[1] Ellner and Rees 2006 being the best starting reference for those....
[2] It'll be on CRAN someday...