Friday, April 20, 2012

He had it coming, JAGS edition



From the JAGS help forum:
I figured it out. In case this is helpful for someone else, a way of adding a function [to JAGS] is to 1) go to src/modules/bugs/functions and add a .cc and .h file for your function (copy an existing one as a template) 2) edit both src/modules/bugs/functions/Makefile.in and src/modules/bugs/bugs.cc to reflect the addition 3) go to the top level and do a "autoreconf --force --install" to rebuild the configure file 4) configure, make, sudo make install as usual I apologise if this is not a good solution, but it did work for me.
The sampler for degrees of freedom in the Student's t distribution is behaving "badly" for my particular model and I'm facing the inevitable.... which reminds me that the following quote, from Troy Day's website is also applicable to programmers:
I love coding as much as the next guy, but not when it jumps out at you from some dark corner of a black box. Also, Chicago is relevant:


Student's t distribution: your tail looks fat

Student's t distribution can be approximated with a normal distribution for large degrees of freedom.  This is one of those statistics factoids which everyone recognizes, heck, it's in Wikipedia.  For some code I'm writing (part of which involves fitting a t distribution), it would be really convenient to be able to decide to use this approximation beyond a certain number of degrees of freedom.  John Cook has written a little post about this, but I was interested in quantifying it a little more. It's actually a little depressing.



dat <- seq(0,100,0.01)
out <- matrix(data=dat, nrow=length(dat), ncol=500)
for ( i in 1:500 ) { out[,i] <- dnorm(x=out[,i])/dt(x=out[,i],df=i*5)<0.1}
outW <- apply(X=out,MARGIN=2,FUN=function(x) {min(which(x>0))})
plot(x=1:500*5,y=outW/100, pch='.', 

   xlab='Degrees of Freedom', 
   ylab='# of SD before normal tail is 1/10th of t tail')

Prior to carrying out the analysis it's hard to tell what the estimated degrees of freedom might be (based on the mass of data near the central mound of the distribution), but I certainly expect quite a few samples out beyond five SD units.  No approximation for me.