## Tuesday, December 13, 2011

### Bayesian Balls

I've been working broadening my understanding Bayesian thinking using Christian Robert's book The Bayesian Choice and one of the early examples in the book  had me confused.  The example is 1.2.2 which goes like so:
A billard ball $$W$$ is rolled on a line of length one, with a uniform probability of stopping anywhere.  It stops at $$p$$.  A second ball $$O$$ is then rolled $$n$$ times under the same assumptions and $$X$$ denotes the number of times the ball $$O$$ stopped on the left of $$W.$$  Given $$X$$, what inference can we make on $$p$$?
Robert states that the prior distribution on $$p$$ is uniform on $$[0,1]$$.  The first point of confusion for me was that the prior distribution in this example comes not from prior belief, but from the prior experiment where the ball $$W$$ is rolled.

When I was first exposed to Bayesian statistics, I was excited about the flexibility for fitting models.  While I was aware that accepting these things called prior distributions had some epistemological implications, it did not bother me.  I viewed (and still view) the specification of prior belief into a distribution of parameters as a valuable way of making the researchers thoughts on the problem explicit.  After all if your analysis says what you want it to, but you can't justify your prior distribution to your peers, you won't have much luck convincing them of your result.

What I missed was that a prior distribution can be a way of conditioning on belief, experience, or observed events.  This makes the prior distribution even more valuable because it can encode belief about a process rather than just a belief about the distribution.  For example if one assumes that the interval $$[0,1]$$ is very long and the billard ball won't make it to the other end, some sort of decaying distribution on $$p$$ would make more sense than the uniform.  Robert briefly describes the rationale for using the prior distribution as follows:
... the use of the prior distribution is the best way to summarize the available information (or even lack of information) about this parameter as well as the residual uncertainty.
I remember this point from my mathematical statistics course but it's not surprising that it didn't strike me as especially important when I was dazzled by the fact that I just had to come up with a (log) posterior and a (log) prior and learn a couple of (simple!) algorithms to get answers out of my data.