Saturday, August 13, 2011

Improving statistical understanding in biology: teh ANOVA, it's robust to stuff

Recently there was a blog post at Expansed about how R might improve the practice of statistics.  The basic idea is that when you have to face the code which implements your statistical analysis you might think it through a little more than clicking.  I think that logic fail pretty quickly, faster than you can say cut-and-paste.  The practice of cut-and-pasting R code for a whole analysis (or worse, BUGS code!) is common even with practitioners who have worked with those tools for multiple years.  Sooner or later it catches up with you and you might learn something, but it's not inevitable.

For kicks, I coded up the following to express my dismay at the state of art:



files <- dir()
output <- list()
for ( f in files ) {
   dat <- read.csv(f)
   output[[f]] <- lm( y ~ ., data = dat )
   test <- summary(output[[f]])
   ps <- test$coefficients[,4]
   if (any(ps < 0.05)) {
      print(test)
      cat("Eureka! There's gold in ", f, ".", sep = "")
   }
}





In a few weeks I'll try to code up something that uses stepwise significance testing to do variable selection.  Now how long before this makes it into somebody’s analysis...? Drop me a line if you use it!


No comments:

Post a Comment