Saturday, January 22, 2011

Prior probability

Suppose someone flips a coin thirty times, and it comes up heads every time. Suppose, further, that you are asked to choose between several hypotheses regarding the coin. Now, one usually starts by assuming a coin is fair, or close enough, but the odds of thirty heads in a row with a fair coin are roughly one in a thousand million (one in a billion, for you short-scale users). It's not impossible, of course, but one might consider other hypotheses. For example, the "two-headed" hypothesis, which would predict heads in every flip of the coin. With a fixed coin,  you would expect heads at every turn with probability of almost 1 (1 is probability talk for what's most commonly called 100%. You never assign anything probability of exactly one).

While the fixed coin hypothesis is not the first one you consider, the evidence (the observed results of the coin-flips) favours it, so you might decide you venture a guess that you think it's the best hypothesis so far. But, imagine that soon afterwards, your friend comes up to you and says: "Aha! I have conceived a new hypothesis!" (your friend has a love for drama). "The coin is fixed so that it shows heads for thirty flips, and on the flip number thirty-one it shows tails! The evidence of the coin flips supports this hypothesis just as well as your two-headed one!"
You might be tempted to simply ask the coin-flipper to flip the coin again, and yes, that would be a way to test that hypothesis. It is in general a good idea to test that which is testable, after all. But what if you no longer had access to the coin? What if your friend had said a thousand flips, instead of thirty? Would both hypothesis be equally likely, since they are equally supported by the observed coin-flips?

The answer is, rather obviously, no. There's a reason we don't go about suggesting stuff like grue and bleen. Well, those of us that aren't philosophers, with them all bets are off. Anyway, that reason is called Ockahm's Razor, and it says that we should prefer simple explanations over complex ones.

That is, of course, a rather vague definition of Ockham's Razor. You can look up stuff like Kolomogorov complexity and Minimum Message Length for an understanding of what complexity is and how exactly the Razor tells us to avoid it. You could also look up overfitting to see why we prefer simpler explanations to more complex ones that might even be slightly better at fitting the available data.

So, as you would have noticed if you had read all that stuff I told you to last paragraph, "Thirty heads and then one tail" is a more complex hypothesis than "always heads" ("always heads" is in turn more complex than "fair coin", though that is less obvious). Ockham's Razor, amongst other things*, tells us that when you have the same evidence, the less complex hypothesis is more probable. That applies to preferring "always heads" to "thirty-then-one", but it also applies to preferring "fair coin" to "always heads" at the start. That is, we can use Ockham's Razor to tell us which hypothesis is more likely before we gather the evidence. This should give you an idea of what is meant by the concept of "prior probability".

I could define "prior probability" in Bayesian terms, but I'm not good enough at explaining that. I'd just tell you to read this and come back, except that if you already read that I have nothing else to say here. So, I will attempt a brief description of the idea and hope I don't fuck it up too much. Essentially, prior probability is how likely you consider something to be, before weighing a relevant piece of evidence (for or against). After weighing the evidence, your new probability estimate is called "posterior probability". Your posterior probability after one piece of evidence become your prior probability for the next piece. Which means that prior probability is not always calculated without any evidence, it only reflects how your beliefs look at one point in the process of examining new evidence.

So, to combine both concepts discussed so far, Ockham's Razor affects your prior probability. In fact, if you're perfectly rational (you're not) and find yourself in a state of no evidence at all regarding something, Ockham's razor determines your prior probability. Not that you are likely to find yourself in a state of literally no evidence, which is why I emphasise that "prior" in prior probability refers to before the evidence to be considered, and not before any and all evidence. 

Before you knew the result of the coin flips, you had reason to favour fair coin over fixed coin, and not just due to Ockahm's Razor, because you weren't without evidence. You know about coins, for example. You know that, most of the time, coins aren't fixed, that most coins are close enough to fair. If someone had asked you to predict the number of heads and tails, you'd have gone for 15-15, not 30-0, and that would have been the best bet you could've made given the evidence. Similarly, you also had very strong evidence against the thirty-then-one hypothesis, and not just the fact that there are more two headed coins than thirty-then-one coinds. A coin that can somehow manipulate itself to fit into such a specific pattern is very unlikely given the laws of physics. If both hypotheses were equally complex, you'd still have strong evidence to prefer two-headed, and this evidence went into your prior probability. The further evidence, the coin flips, don't favour one over the other, so your posterior probability doesn't favour one over the other any more than your prior did. Indeed, for comparing two-headed vs thirty-then-one, the coin-flips evidence changes nothing. But it doesn't have to, because your prior probability very much favoured two-headed over thirty-then-one already.
This was going somewhere, I swear. But this post is already too long, so the story is to be continued tomorrow. I'm sure all zero of you can't wait.

*It is a common misconception that the Razor only applies with equal evidence.It is not one I wish to perpetuate, so I'll take this footnote as a chance to clarify that formalizations of the Razor also show that a hypothesis that is more accurate (fits the evidence better) can lose out against less accurate but simpler hypotheses. Again, look up overfitting to understand why.

No comments:

Post a Comment