Saturday, June 4, 2011

Why do ecologists still use Excel for statistics?

Thanks to John Cook for linking to an article on the statistical shortcomings of Excel (rampant and unrepented):
The random number generator has always been inadequate. With Excel 2003, Microsoft attempted to implement the Wichmann–Hill generator and failed to implement it correctly. The “fixed” version appears in Excel 2007 but this “fix” was done incorrectly. Microsoft has twice failed to implement correctly the dozen lines of code that constitute the Wichmann–Hill generator; this is something that any undergraduate computer science major should be able to do. The Excel random number generator does not fulfill the basic requirements for a random number generator to be used for scientific purposes: (1) it is not known to pass standard randomness tests, e.g., L’Ecuyer and Simard’s (2007) CRUSH tests (these supersede Marsaglia’s (1996) DIEHARD tests—see Altman et al. (2004) for a comparison); (2) it is not known to produce numbers that are approximately independent in a moderate number of dimensions; (3) it has an unknown period length; and (4) it is not reproducible. For further discussion of these points, see the accompanying article by McCullough (2008); the performance of Excel 2007 in this area is inadequate.
This makes me want to design an undergraduate course in ecolgical statistics which triggers one of these errors in each homework (you might have to do more than one to get through all of them in one course)!

No comments:

Post a Comment