Sunday, September 11, 2011

Three parameter t distribution (in JAGS, but not numpy.random)

I can't seem to find a three-parameter t-distribution in numpy/scipy and I know JAGS has one.  It turns out it comes from the "jrmath" library which is a part of R.

 *  Mathlib : A C Library of Special Functions
 *  Copyright (C) 1998 Ross Ihaka
 *  Copyright (C) 2000-2008 The R Development Core Team
 *  This program is free software; you can redistribute it and/or modify
 *  it under the terms of the GNU General Public License as published by
 *  the Free Software Foundation; either version 2 of the License, or
 *  (at your option) any later version.
 *  This program is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  GNU General Public License for more details.
 *  You should have received a copy of the GNU General Public License
 *  along with this program; if not, a copy is available at
 *    Pseudo-random variates from a t distribution.
 *    This function calls rchisq and rnorm to do the real work.

#include "nmath.h"

double rt(double df, RNG *rng)
    if (ISNAN(df) || df <= 0.0) ML_ERR_return_NAN;

  return norm_rand(rng);
    else {
/* Some compilers (including MW6) evaluated this from right to left
  return norm_rand(rng) / sqrt(rchisq(df) / df); */
  double num = norm_rand(rng);
  return num / sqrt(rchisq(df, rng) / df);

And from modules/bugs/distributions/

double DT::r(vector const &par, RNG *rng) const
    return rt(DF(par), rng) / sqrt(TAU(par)) + MU(par);

Which gets you the three-parameter version, and suggests that the following NumPy code should work nicely:

def student_t_3(loc, df, scale, size):
    tau = 1/(scale**2)
    t_draw = np.random.standard_t(df = df, size = size)
    scaled_t_draw = t_draw / np.sqrt(tau)
    return(scaled_t_draw + loc)

...and in fact generating a large sample from this, as well as from R, followed by a quantile-quantile plot gives some confirmation that this works.

This could be considered the historical-materialist approach to coding statistical functions.  I'm sure there is a reference text for this sort of thing somewhere but it's not the sort of thing which jumps to mind for a biologist.

It helps to know that numpy.random is coded expecting that everything is parameterized by scale parameters, whereas JAGS is coded expecting that (almost?) everything is parameterized with rate parameters.

No comments:

Post a Comment