The Singing Dodo: 2012

Monday, September 17, 2012

Wherein Reuters reports that the rainforest is wet. Also riots and a certain prophet.

Once somebody "offends Islam"[1], even in the gentlest literary way possible, the ensuing pearl-cluthcing, apologies, riots, embassy attacks, further apologies, and murders are almost entirely predicable. It's just a question of how bad it's going to get and how long the show will go on in the international news.

Also predictable is the slew of "news" articles which no insight as to why a mere cartoon or book would lead to murder and mayhem on an international scale. Instead of insight we get dreck like this:

"The demonstrations were the latest across the world provoked by a short film made with private funds in the United States that depicted the Prophet Mohammad as a fool and womanizer."

Somehow I don't think that people were sitting around having tea and listening to the news and when they heard about the movie they decided it would be a great time to go storm an embassy. Apparently with kids in front (Reuters):

Figure 1: Love Islam, will throw stones for same.

Honestly, Reuters, are we supposed to believe that this school field trip (see fig. 1) was the result of a film? ... or did somebody maybe work a little to convince these folks that throwing rocks and burning things that were vaguely Western was a good idea? If you can't write news, maybe you could just give up and leave it to the bloggers who are honestly just as good at reading the press release from the Pakistani embassy.

[1] People can be offended, ideas... not so much.

Sunday, August 12, 2012

Evening Descent to Salt Lake City

The mist was thick. The salt lakes looked a little creepy, and in the dim light the rough landscape had a damaged look. I was pretty sure we were descending into Salt Lake City, but I couldn't shake the thought that we were lost and it was really Mordor. Maybe my mind was trying to process the idea of a presidential candidate who is willing to ignore global warming, who equivocates about Roe v. Wade, who wants to put invasive doctrinal positions on sex education ahead of the interest of the secular state and American children, and who support opportunists like Jindal.

Is this where we're at? And he took Paul Ryan to the altar?! Oy, can't wait to leave Mordor.

Saturday, May 5, 2012

The uncanny resemblance between certain abstracts and herbal remedy claims

Dear sir,

When you state the conclusion of your study as "our results show that X depends on Y and Z", that is about as memorable and informative as the labels on those little bottles of water in the homoeopathy aisle which say "has been shown to support prostate health." The only thing that surprises me about your vague abstract is that there isn't a star next to it that says "*These claims may be relevant to science but I don't know why."

Friday, April 20, 2012

He had it coming, JAGS edition

From the JAGS help forum:

I figured it out. In case this is helpful for someone else, a way of adding a function [to JAGS] is to 1) go to src/modules/bugs/functions and add a .cc and .h file for your function (copy an existing one as a template) 2) edit both src/modules/bugs/functions/Makefile.in and src/modules/bugs/bugs.cc to reflect the addition 3) go to the top level and do a "autoreconf --force --install" to rebuild the configure file 4) configure, make, sudo make install as usual I apologise if this is not a good solution, but it did work for me.

The sampler for degrees of freedom in the Student's t distribution is behaving "badly" for my particular model and I'm facing the inevitable.... which reminds me that the following quote, from Troy Day's website is also applicable to programmers:

I love coding as much as the next guy, but not when it jumps out at you from some dark corner of a black box. Also, Chicago is relevant:

Student's t distribution: your tail looks fat

Student's t distribution can be approximated with a normal distribution for large degrees of freedom. This is one of those statistics factoids which everyone recognizes, heck, it's in Wikipedia. For some code I'm writing (part of which involves fitting a t distribution), it would be really convenient to be able to decide to use this approximation beyond a certain number of degrees of freedom. John Cook has written a little post about this, but I was interested in quantifying it a little more. It's actually a little depressing.

dat <- seq(0,100,0.01)
out <- matrix(data=dat, nrow=length(dat), ncol=500)
for ( i in 1:500 ) { out[,i] <- dnorm(x=out[,i])/dt(x=out[,i],df=i*5)<0.1}
outW <- apply(X=out,MARGIN=2,FUN=function(x) {min(which(x>0))})
plot(x=1:500*5,y=outW/100, pch='.',

xlab='Degrees of Freedom',
ylab='# of SD before normal tail is 1/10th of t tail')

Prior to carrying out the analysis it's hard to tell what the estimated degrees of freedom might be (based on the mass of data near the central mound of the distribution), but I certainly expect quite a few samples out beyond five SD units. No approximation for me.

Wednesday, March 14, 2012

Re: XKCD 979

Life imitates XKCD.

Get cryptic error from Libre Office
Realize this is one package I never want to understand. It's an MS Office replacement which faithfully replicates the whole anti-UNIX philosophy of MS Office. Down to some of the Excel statistics bugs.
Google for "Extension manager: exception in synchronize"
Find this record of an chat session in German
Find a reference to this bug at 8:25 pm in the chat
ls /usr/lib/libreoffice/share/uno_packages/cache/uno_packages
chmod a+rx /usr/lib/libreoffice/share/uno_packages/cache/uno_packages
It works again!

I'm glad Vassar at least taught me to use primary sources....

Monday, March 12, 2012

Math in biology manuscripts

Steps for co-authors helping out with the math on a manuscript:

look at equations (and text), dig up references;
figure out the equations are wrong;
try to figure out why they're right;
fail;
start to write email explaining your logic about the (potential) mistake (don't fill in recipient at this stage);
find your mistake;
discard draft email;
repeat 1-6 until step 6 fails;
fill in recipient and send;
rinse and repeat, try not to look like a douche

Step 10 is the hardest. I suppose failing at step 2 is eventually the way out.

Tuesday, March 6, 2012

Bane of my existence

That little "12031" is a life-saver though.

> source('adjustData.R'); source('runModel.R')
Compiling model graph
   Declaring variables
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 1724335

Initializing model
Deleting model

Error in jags.model(file = "unroll-wbATS-DCJS.bugs", data = final.data,  : 
  Error in node alive[12031]
Unobserved node inconsistent with unobserved parents at initialization

Bad habits

If you're a biologist and there's crazy weather, your first thought is "I wonder what my study site(s) look(s) like?"... the second is inevitably whether it was sensible to go driving on dirt roads to find out.

Figure 1: No, it was not sensible, but it was beautiful.

Victory! (Domestic)

Sometimes you got to take the small ones. Especially when the large ones are at the mercy of the JAGS sampler fairy.

Figure 1: victory, small.

Monday, January 9, 2012

Pięć Dwa Dębiec immitates life (or vice versa?)

I was recently listening to some polish hip-hop and was reminded of this incident I had on a subway in Berlin. I was riding with a female friend who was studying there on a semester abroad and had learned a fair amount of German. She is very blond and was dressing like the locals (sort of) so she could easily be mistaken for a German.

A small group gets on together and they're chatting away. I only remember two of them: a man and a woman. The man was talking to the group and I remember the woman was there because he was talking about her and acting very condescending about her. He reached across to brush her chin a couple of times to emphasize something. She was obviously uncomfortable with the exchange but it didn't seem to bother him. They were all Poles, though the man had apparently emigrated to Germany a while back and maybe he was showing the group around. I don't really know. After a while he started talking trash about Germans and that's where we come in.

Neither of us had said a word, we were just looking around. My friend must have made the mistake of acknowledging them (smiling or something like that) and the man assumed we were German. He started in talking trash about us, in Polish of course. Polish insults are very colorful, but I can't remember his words (we were wearing glasses, something about four-eyes and smiling like idiots, etc...). My friend had no clue what the man said but I was offended. I didn't really want to get into a confrontation with the guy so it took me a moment to come up with something to say.

After a leisurely delay he was still going on about us and we were going to get off soon so I leaned over to him and said "Prosze pana, przynajmniej jeden z nas jest Polakiem."/"excuse me sir, but at least one of us is a Pole"). I tried to keep as much of an informative and relaxed tone as possible. The color drained out of his face and he started apologizing a mile a minute. After a while the train stopped at our station and my friend and I got off. On our way out the man was still apologizing profusely and I was enjoying myself immensely.

I left Poland quite young and ignorant of the culture so it was only recently that I realized that calling someone a German "Niemiec" is a fairly crude insult. I can see that calling a Pole a German on the subway without knowing who they were would be an occasion for embarrassment and maybe a little fear. The incident still makes me smile.

Figure 1: Nie jestem Niemcem

Thursday, January 5, 2012

Things that crawl out of dark caves

Really, this man is still taken seriously somewhere? Maybe I was mistaken about what his potential political trajectory,... (!?)

Monday, January 2, 2012

Dear Reader

This blog has a small readership that seems to spike whenever I comment at Andrew Gelman's blog, Pharyngula, or some other major blog destination. The blog helps me keep my writing muscles exercised when I'm dealing with long periods of coding... and it lets me take breaks from academic writing by commenting about politics. If you happen to check in here periodically and would like me to follow up a little faster on any of topics I've left hanging, especially the technical ones, please leave a comment.

Random Regression in JAGS

Update: modified the overparameterized version to center on the mean of the random effects, rather than using the overall intercept. That works even with multiple random effects. It still doesn't match exactly what Andrew Gelman suggests in HMLM but I haven't had the chance to compare carefully.

Update: edited code, following comment about error.

I think the BUGS language really needs a book on best practices. For example the obvious approach to generic random regression BUGS code fails miserably. Before I show what I mean, let's make some data:


library(rjags)
library(ggplot2)

# Set up data.
input <- within(
    data = list(),
    expr = {
        beta_0 <- 5.222
        beta_x1 <- 2.93
        nGroups <- 10
        maxGroupSize <- 20
        beta_group_sd <- 1.3
        beta_group <- rnorm(
            n=nGroups,
            mean=0,
            sd=beta_group_sd
        )
        group_size <- rbinom(
            n=nGroups,
            size=maxGroupSize,
            prob=0.4
        )
        group <- unlist(
            sapply(
                X=1:nGroups,
                FUN=function(x,group_size) {
                    rep(x,times=group_size[x])
                },
                group_size=group_size,
                simplify=TRUE
            )
        )
        nRows <- sum(group_size)
        x1 <- runif(n=nRows, min = -2, max = 2)
        y <- vector(mode='numeric', length=nRows)
        error <- rnorm(n=nRows, mean = 0, sd = 0.4)
        for (r in 1:nRows) {
            y[r] <-
                beta_0 +
                beta_x1*x1[r] +
                beta_group[group[r]] +
                error[r]
        }
    }
)

input_clean <- input[c('nGroups','group','nRows','y','x1')]

To me the obvious way to do random regression in a situation where the model might change over time (say you do multiple rounds of model checking and development in the approach suggested by Andrew Gelman, among others) you would like to be able to write the equation for /[y[r]/] much as above in the simulation code, and then add further restrictions on groups of coefficients.

Here's the code for that model:



td <- tempdir()
model_file <- tempfile(
    pattern = 'random-regression',
    tmpdir = td,
    fileext = '.bugs'
)
cat('
model {
    # Linear model:
    for ( i in 1:nRows ) {
        mu[i] <- beta_0 + beta_x1 * x1[i] + beta_group[group[i]]
        y[i] ~ dnorm(mu[i],precision)
    }
    # Parameters:
    beta_0 ~ dnorm(0,0.1)
    beta_x1 ~ dnorm(0,0.1)
    for ( g in 1:nGroups ) {
        beta_group[g] ~ dnorm(0,beta_group_prec)
    }
    beta_group_prec <- 1/(beta_group_sd^2)
    beta_group_sd ~ dunif(0,5)

    # Error:
    precision <- 1/error_sd^2
    error_sd ~ dunif(0,5)

}
', file = model_file
)

If you do take this approach, you end up with high autocorrelation in posterior samples. It's (sometimes) ok with just a single random effect with few groups. In many small data-sets this can be brute-forced by thinning. As I recall the specific issue is that all the random effects end up negatively correlated with the overall mean. If you figure that out, and maybe run into the section on overparmeterized models in MLHM (Gelman) or this set of lecture notes (Jackman) you might come up with an over-parameterized solution:


td <- tempdir()
model_file <- tempfile(
    pattern = 'random-regression',
    tmpdir = td,
    fileext = '.bugs'
)
cat('
model {
    # Linear model:
    for ( i in 1:nRows ) {
        mu[i] <- beta_0 + beta_x1 * x1[i] + beta_group[group[i]]
        y[i] ~ dnorm(mu[i],precision)
    }
    # Parameters:
    beta_0 ~ dnorm(0,0.1)
    beta_x1 ~ dnorm(0,0.1)
    for ( g in 1:nGroups ) {
        beta_group_unhooked[g] ~ dnorm(0,beta_group_prec)
        beta_group[g] <- beta_group_unhooked[g] - mean(beta_group_unhooked[])
    }
    beta_group_prec <- 1/(beta_group_sd^2)
    beta_group_sd ~ dunif(0,5)

    # Error:
    precision <- 1/error_sd^2
    error_sd ~ dunif(0,5)

}
', file = model_file
)

Here's the code to run both and you can see the second version behaves much better by plotting some traces.


# Run model.
model <- jags.model(
    file = model_file,
    data = input_clean,
    n.chains = 5,
    n.adapt = 2000
)

samp <- jags.samples(
    model = model,
    variable.names = c(
        'beta_0',
        'beta_x1',
        'beta_group',
        'beta_group_sd',
        'error_sd'
    ),
    n.iter = 1000,
    thin = 1
)

for (nom in names(samp)) {
    print(nom);
    print(gelman.diag(samp[[nom]]))
}

Martyn Plummer, Andrew Gelman, and others have commented in various places that it would be nice to get JAGS/OpenBUGS to recognize this sort of thing and do it automatically but there are many situations where you can only get good results from BUGS code by playing to the strengths of the system. It's probably not worth writing a book on best practices but it sure is worth a few blog posts. I also wonder if JAGS couldn't do much better in situations like this with minimal additional code if it didn't allow for the user to code hints for the compiler about which nodes are correlated.