Friday, April 20, 2012

Student's t distribution: your tail looks fat

Student's t distribution can be approximated with a normal distribution for large degrees of freedom.  This is one of those statistics factoids which everyone recognizes, heck, it's in Wikipedia.  For some code I'm writing (part of which involves fitting a t distribution), it would be really convenient to be able to decide to use this approximation beyond a certain number of degrees of freedom.  John Cook has written a little post about this, but I was interested in quantifying it a little more. It's actually a little depressing.

dat <- seq(0,100,0.01)
out <- matrix(data=dat, nrow=length(dat), ncol=500)
for ( i in 1:500 ) { out[,i] <- dnorm(x=out[,i])/dt(x=out[,i],df=i*5)<0.1}
outW <- apply(X=out,MARGIN=2,FUN=function(x) {min(which(x>0))})
plot(x=1:500*5,y=outW/100, pch='.', 

   xlab='Degrees of Freedom', 
   ylab='# of SD before normal tail is 1/10th of t tail')

Prior to carrying out the analysis it's hard to tell what the estimated degrees of freedom might be (based on the mass of data near the central mound of the distribution), but I certainly expect quite a few samples out beyond five SD units.  No approximation for me.

No comments:

Post a Comment