Tuesday, July 26, 2011

Metropolis alogrithm as an R closure

I've been wanting to understand closures in R a little better for a while and MCMC algorithms provide a nice use case.  In coding an MCMC algorithm, it's easy to write a function to take you one time step forward.  In most naive implementations what you see is a pile of code that manages the data and calls that function repeatedly.  The point of a closure is that you can wrap the code for tracking the data around the function which takes the MCMC steps and have a nice package which is easy to call repeatedly.  A quick implementation looks like so:


mcmcChain <- function(logTarget,proposal,cntrl) {
    if ( !all(c('nIter','inits') %in% names(cntrl)) ) {
        stop('\'cntrl\' argument must be a list with \'nIter\', and \'inits\' elements')
    }
    out <- matrix( data = NA, nrow = length(cntrl[['inits']]), ncol = cntrl[['nIter']] )
    out[,1] <- cntrl[['inits']]
    logD <- vector( mode = 'numeric', length = cntrl[['nIter']] )
    logD[1] <- logTarget(out[,1])
    logA <- vector( mode = 'numeric', length = cntrl[['nIter']] )
    prop <- vector( mode = 'numeric', length = cntrl[['nIter']] )
    chain <- function() {
        for ( i in 2:cntrl[['nIter']] ) {
            prop[i] <<- proposal(out[,i-1])
            logD[i] <<- logTarget(prop[i])
            logA[i] <<- logD[i] - logD[i-1]
            if (logA[i] > 0) {
                out[,i] <<- prop[i]
            } else {
                out[,i] <<- out[,i-1]
            }
        }
        o <- list(
            sample = out,
            proposal = prop,
            logD = logD,
            logA = logA
        )
        return(o)
    }
    return(chain)
}

The assumptions for this code are that the log target density and the proposal density take their parameters in a vector and return a vector with the same structure.  The output is a list with the sample, all the proposed values, log density, and log acceptance ratio.  It can be run to fit a normal distribution to three concocted values (0.1,0,-0.1) like so.  The sample is actually generated with the out <- tp() call.


logTarget <- function(mu) {sum(dnorm(x = c(0.1,0,-0.1), mean = mu, sd = 1,log=TRUE))}
proposal <- function(mu) { mu + runif(1,-0.1,0.1) }
cntrl = list( nIter = 1000, inits = 3)
tp <- mcmcChain(
    logTarget = logTarget,
    proposal = proposal,
    cntrl = cntrl
)
par(mfrow=c(2,3))
out <- tp()
plot( x = 1:cntrl[['nIter']], y = out$sample, pch = '*', xlab = 'iteration', ylab = expression(mu) )
plot( x = 1:cntrl[['nIter']], y = out$logD, pch = '*', xlab = 'iteration', ylab = 'log density')
plot( x = 1:cntrl[['nIter']], y = exp(out$logA), pch = '*', xlab = 'iteration', ylab = 'a')
hist( x = out$sample, xlab = expression(mu), breaks = cntrl[['nIter']]^(1/2) )
hist( x = out$logD, xlab = 'log density', breaks = cntrl[['nIter']]^(1/2) )
hist( x = exp(out$logA), xlab = 'a', breaks = cntrl[['nIter']]^(1/2) )


This is pretty flexible as you can substitute any target density and proposal density, and in my own work, the first time I wanted to do something more complicated was in dealing with structured data (e.g.- mark-recapture data). Speed can be an issue with MCMC samplers written in R, but in any complex problem it would generally be the calculation of the target density which matters most.  That could be implemented with in C++ (using Rcpp) and used with the above code. Please let me know (to satisfy my curiosity) if you use this code.

No comments:

Post a Comment