In this post I want to review an interesting result by David Freedman (Annals of Mathematical Statistics, Volume 36, Number 2 (1965), 454-456) available at projecteuclid.org.
The result gets very little attention. Most researchers in statistics and machine learning seem to be unaware of the result. The result says that, “almost all” Bayesian prior distributions yield inconsistent posteriors, in a sense we’ll make precise below. The math is uncontroversial but, as you might imagine, the intepretation of the result is likely to be controversial.
Actually, I had planned to avoid all “Bayesian versus frequentist” stuff on this blog because it has been argued to death. But this particular result is so neat and clean (and under-appreciated) that I couldn’t resist. I will, however, resist drawing any philosophical conclusions from the result. I will merely tell you what the result is. Don’t shoot the messenger!
The paper is very short, barely more than two pages. My summary will be even shorter. (I’ll use slightly different notation.)
Let be an iid sample from a distribution on the natural numbers . Let be the set of all such distributions. We endow with the topology. Hence, iff for all .
Let denote a prior distribution on . (More precisely, a prior on an appropriate -field, namely the Borel sets generated by the discrete topology.) Let be all priors. We endow the set of priors with the topology. Thus iff for all bounded, continuous, real functions .
Let be the posterior corresponding to the prior after observations. We will say that the pair is consistent if
where is the product measure corresponding to , and is a point mass at .
Now we need to recall some topology. A set is nowhere dense if its closure has an empty interior. A set is meager (or first category) if it is a countable union of nowehere dense sets. Meager sets are small; think of a meager set as the topological version of a null set in measure theory.
Freedman’s theorem is: the sets of consistent pairs is meager.
This means that, in a topological sense, consistency is rare for Bayesian procedures. From this result, it can also be shown that most pairs of priors lead to inferences that disagree. (The agreeing pairs are meager.) Or as Freedman says in his paper:
“ … it is easy to prove that for essentially any pair of Bayesians, each thinks the other is crazy.”
On the frequentist side, convergence is straightforward here.
Indeed, if denotes the mass function and the empirical mass function then
In fact, even stronger statements can be made; see the recent paper by
Daniel Berend and Leo (Aryeh) Kontotovich
(paper is here).
As a postscript, let me add that David Freedman, who died in 2008, was a statistician at Berkeley. He was an impressive person whose work spanned from the very theoretical to the very applied. He was a bit of a curmudgeon, which perhaps lessened his influence a little. But he was a deep thinker with a healthy skepticism about the limits of statistical models, and I encourage any students reading this blog to seek out his work.