Tuesday, September 27, 2011

Public Service Announcement

You know those vials in the freezer? The ones that would really love to have a date with some propanol/hexane before getting shot into the HPLC? The fish scales sitting in envelopes begging to have their rings counted?  The bird feathers you really think you should spec just because they're there?  If you don't know what question you want to use them to ask, they're not data, they're a waste of time.

Thank you for listening to this announcement.

Had this been a real emergency, your PI would have run screaming into the room waiving those vials in the air.

Wednesday, September 14, 2011

HPV vaccine

At the recent debate of hopefuls for the Republican presidential nomination, Michelle Bachman apparently made a series of ugly false claims about the safety record of the Human Papilloma Virus (HPV) vaccine.  It was the usual dribble of fear-spasms calculated to appeal to a rabid anti-feminist sub-group of the electorate.

It was ugly because HPV kills: along with the initial, rather unpleasant* disease, HPV causes cervical, vulvar, vaginal, penile, anal, and oropharyngeal cancer.  In fact it causes most cervical cancers which mean that the HPV vaccine might be the first pharmaceutical which lets us cure a cancer.    HPV vaccine also has a shiny safety record, better than the safety record of Tylenol/aspirin/ibuprofen which I'm sure Bachman regularly throws down her gullet.  So, here's the American Academy of Pediatrics (AAP) to set the record straight:
 “The American Academy of Pediatrics would like to correct false statements made in the Republican presidential campaign that HPV vaccine is dangerous and can cause mental retardation. There is absolutely no scientific validity to this statement. Since the vaccine has been introduced, more than 35 million doses have
been administered, and it has an excellent safety record."
Furthermore, this is only an effective way to cure cancer population-wide if it happens in the right time-frame, i.e.-before sexual activity.
“The American Academy of Pediatrics, the Centers for Disease Control and Prevention, and the American Academy of Family Physicians all recommend that girls receive HPV vaccine around age 11 or 12. That’s because this is the age at which the vaccine produces the best immune response in the body, and because it’s
important to protect girls well before the onset of sexual activity. In the U.S., about 6 million people, including teens, become infected with HPV each year, and 4,000 women die from cervical cancer. This is a life-saving vaccine that can protect girls from cervical cancer.”
Dear parents: do it.  Dear everybody else: if you're considering voting for this woman, or any other Republican candidate, please convince them to engage in political grandstanding less likely to cause the death of women and the destruction of families.
---------------------
* Sorry if you clicked that.

Sunday, September 11, 2011

Three parameter t distribution (in JAGS, but not numpy.random)

I can't seem to find a three-parameter t-distribution in numpy/scipy and I know JAGS has one.  It turns out it comes from the "jrmath" library which is a part of R.


/*
 *  Mathlib : A C Library of Special Functions
 *  Copyright (C) 1998 Ross Ihaka
 *  Copyright (C) 2000-2008 The R Development Core Team
 *
 *  This program is free software; you can redistribute it and/or modify
 *  it under the terms of the GNU General Public License as published by
 *  the Free Software Foundation; either version 2 of the License, or
 *  (at your option) any later version.
 *
 *  This program is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *  GNU General Public License for more details.
 *
 *  You should have received a copy of the GNU General Public License
 *  along with this program; if not, a copy is available at
 *  http://www.r-project.org/Licenses/
 *
 *  DESCRIPTION
 *
 *    Pseudo-random variates from a t distribution.
 *
 *  NOTES
 *
 *    This function calls rchisq and rnorm to do the real work.
 */

#include "nmath.h"

double rt(double df, RNG *rng)
{
    if (ISNAN(df) || df <= 0.0) ML_ERR_return_NAN;

    if(!R_FINITE(df))
  return norm_rand(rng);
    else {
/* Some compilers (including MW6) evaluated this from right to left
  return norm_rand(rng) / sqrt(rchisq(df) / df); */
  double num = norm_rand(rng);
  return num / sqrt(rchisq(df, rng) / df);
    }
}


And from modules/bugs/distributions/DT.cc:


double DT::r(vector const &par, RNG *rng) const
{
    return rt(DF(par), rng) / sqrt(TAU(par)) + MU(par);
}


Which gets you the three-parameter version, and suggests that the following NumPy code should work nicely:


def student_t_3(loc, df, scale, size):
    tau = 1/(scale**2)
    t_draw = np.random.standard_t(df = df, size = size)
    scaled_t_draw = t_draw / np.sqrt(tau)
    return(scaled_t_draw + loc)


...and in fact generating a large sample from this, as well as from R, followed by a quantile-quantile plot gives some confirmation that this works.

This could be considered the historical-materialist approach to coding statistical functions.  I'm sure there is a reference text for this sort of thing somewhere but it's not the sort of thing which jumps to mind for a biologist.

It helps to know that numpy.random is coded expecting that everything is parameterized by scale parameters, whereas JAGS is coded expecting that (almost?) everything is parameterized with rate parameters.

Laplace mixture, bad code

Hm.  Is this a reasonable implementation of a laplace mixture? The try/except seems to be an ugly way of dealing with the fact that int/float don't return a length, but it gets around testing for what they are...




import numpy as np

def laplace_mix(loc, scale, size, pvals = None):
    try:
        dim = len(loc)
        if dim != len(scale):
            raise NameError("Different number of location and scale parameters.")
    except:
        dim = 1
        loc = [loc]
        scale = [scale]
    if pvals is None:
        pvals = [ 1/dim for i in range(dim) ]
    args = zip(loc, scale)
    indicate = np.random.multinomial(n=1, pvals=pvals, size=size)
    where = np.where(indicate == 1)[1]
    out = np.zeros(size)
    for i in range(size):
        out[i] = np.random.laplace( loc = loc[where[i]], scale = scale[where[i]], size = 1 )
    return(out)


I'm still curious if there is a way of avoiding the for loop via numpy magic, but this is not a performance bottleneck so...

Feminism

A couple walks by and the man "play" punches the woman in the arm.  "Don't hit me in public." she says, clearly frustrated.

Tuesday, September 6, 2011

What the new country is coming to....

This was such an awesome idea the first time around, let's do it again!

What the old country is coming to...