Friday, November 13, 2015

Bayesian divergence

Suppose I am considering two different hypotheses, and I am sure exactly one of them is true. On H, the coin I toss is chancy, with different tosses being independent, and has a chance 1/2 of landing heads and a chance 1/2 of landing tails. On N, the way the coin falls is completely brute and unexplained--it's "fundamental chaos", in the sense of my ACPA talk. So, now, you observe n instances of the coin being tossed, about half of which are heads and half of which are tails. Intuitively, that should support H. But if N is an option, if the prior probability of N is non-zero, we actually get Bayesian divergence as n increases: we get further and further from confirmation of H.

Here's why. Let E be my total evidence--the full sequence of n observed tosses. By Bayes' Theorem we should have:

P(H|E) = P(E|H)P(H)/[P(E|H)P(H) + P(E|N)P(N)].
But there is a problem: P(E|N) is undefined. What shall we do about this? Well, it is completely undefined. Thus, we should take it to be an interval of probabilities, the full interval [0,1] from 0 to 1. The posterior probability P(H|E), then, will also be an interval between:
P(E|H)P(H)/[P(E|H)P(H) + (0)·P(N)] = 1
and
P(E|H)P(H)/[P(E|H)P(H) + (1)·P(N)] ≤ P(E|H)/P(N) = 2n / P(N).
(Remember that E is a sequence of n fair and independent tosses if H is true.) Thus, as the number of observations increases, the posterior probability for the "sensible" hypothesis H gets to be an interval [a,1], where a is very small. But something whose probability is almost the whole interval [0,1] is not rationally confirmed. So the more data we have, the further we are from confirmation.

This means that no-explanation hypotheses like N are pernicious to Bayesians: if they are not ruled out as having zero or infinitesimal probability from the outset, they undercut science in a way that is worse and worse the more data we get.

Fortunately, we have the Principle of Sufficient Reason which can rule out hypotheses like N.

8 comments:

Heath White said...

What about this. On your way of construing it, no possible evidence could rule out the “fundamental chaos” hypothesis. Therefore it is a form of Cartesian skeptical hypothesis—a scenario that can’t be ruled out on the basis of evidence. It would be promising, then, to turn the usual anti-skeptical tools on it.

There are a number of these but my preferred one appeals to explanation: just as we have a better (simpler, more predictive, etc.) explanation of our experience if there is an external world than if we are brains in vats, we also have a better explanation of why e.g. long series of coin flips are almost 50/50 heads/tails if we posit a fair coin rather than fundamental chaos. Fundamental chaos, by definition, is the worst sort of explanation there could be.

Though I am not sure about this move in this context, because one of the competing hypothesis throws the principles of explanation itself into question. I can’t decide if that’s problematic or not.

Alexander R Pruss said...

One difference is that with other sceptical hypotheses, if we just suppose low prior probability, we're home free. But in this case, low prior probabilities get magnified by whatever evidence we get.

Walter Van den Acker said...

Dr Pruss

Is there any reason to suppose that fundamental chaos is even possible?
What if instead of the usual PSR we hold that reality is fundamentally non-chaotic?

Alexander R Pruss said...

I take the PSR to be equivalent to the claim that reality is fundamentally non-chaotic. :-)

Walter Van den Acker said...

I think lots of people can agree on such a modest variety of the PSR. I, for on, have no problem with it.

Cameron said...

Hi Dr. Pruss. Just a quick question. What is the reason for going from "P(E|N) is undefined" to "Thus, we should take it to be an interval of probabilities, the full interval [0,1] from 0 to 1"? Thanks.

Alexander R Pruss said...

Cameron:

That seems the best way to formalize the idea that P(E|N) is completely undefined. If there is a better, the argument will need to be changed.

Alexander R Pruss said...

Let me add that modeling probabilities that are not completely defined with intervals is standard. For instance, suppose you uniformly randomly shoot a dart at a round target. The probability that you hit a subset of the target equals the area of the subsets. But not every subset has an area--some are "nonmeasurable". Suppose A is a completely nonmeasurable subset of the target (technically, what I mean by this is that every measurable subset of A has measure zero and every measurable superset of A has measure equal to the area of the target). Let A' be the intersection of A with the left half of the target. We can say something about the probability of hitting A': it is no greater than the probability of hitting the left half of the target. So we can give an upper bound for the probability. We can't give a non-trivial lower bound. So, the probability of hitting A' is represented as the interval [0,1/2]. On the other hand, the probability of hitting A is best represented as the whole interval [0,1].

Modeling probabilities with intervals is known as "imprecise probabilities". You can get a decision-theoretic characterization as follows. If the probability of E is [a,b], then you would be rationally required to pay any sum lower than $a for a chance to win $1 if E occurs, and you would be rationally forbidden from paying any sum greater than $b for it. Decision theory would then tell you nothing about the rationality or lack thereof in the case of sums of money strictly between $a and $b.

In general we can represent the probability of an event E that has no probability as [a,b], where a is the maximum (or supremum, if the probabilities are only finitely additive) of the probabilities of subsets of E that do have a probability, and b is the minimum of the probabilities of the supersets of E that that have a probability.