Nice to see Freedman’s result again. I do not think “meager” is of much concern. Most of the mathematical objects that we work with, continuous functions, differentiable functions are meager in the larger space of all functions. I have not checked, but guess that many sets of regular experiments considered in statistical theory, usually with some differentiability assumptions, would be meager in the set of all experiments ( under Le Cam’s toplology). It might be interesting to see If E , the set of consistent pairs (P,mu) , is of the second category in itself, i.e in the topology restricted to E

RV

]]>Perhaps a simpler and easier to show as practically unimportant example (impossible to happen in applications) is here

http://radfordneal.wordpress.com/2008/08/09/inconsistent-maximum-likelihood-estimation-an-ordinary-example/

(see comment 19 and what seems agreement afterwards)

Larry, the challenge for people with less math skills than you, is that they are worried they are going to be fooled into worrying about things that can’t happen. That people are unintentionally mislead by people with very good math skills is perhaps demonstrated by Peter McCullagh publishing essentially the same error of conditioning on a continuous observation in an example involving ancillarity that Barnard pointed out.

]]>Fair enough. Thanks for your comments.

]]>Thanks for all those replies – I’ve learned a lot. I found Freedman’s relatively simple one-dimensional example in the 1963 paper (that I have attempted to describe above) to be quite useful and educational. The setup is simple enough to be amenable to some sort of intuition – the prior is one-dimensional. But it’s complex enough to lead to the unintuitive behaviour.

It’s easy (and maybe justifiable, in my opinion) to dismiss the general theorem on the grounds that it might have no relevance to practical research. “Weird things happen when weird distributions are pushed to the limit at infinity”. If the search space of allowable distributions is too big, then I would think that all methods, Bayesian or otherwise, will behave quite badly.

> The result gets very little attention.

I don’t think that’s a bad thing. I’m sure everybody has their list of papers that they feel deserve more attention. For me, it’s “Testing a point null hypothesis: the irreconcilability of P values and evidence” http://www.jstor.org/stable/10.2307/2289131.

I’m glad I know about Freedman’s result, and it should be known by the hard core theorists. But if this result is to be promoted and appreciated more widely, then I think the simple examples are better. This doesn’t deserve popularity just because it’s correct, it’s going to need good communication.

]]>Yes. Sorry. I just edited my reply above as you were posting. I meant that, in the general case (prior and posterior over the set of all distributions) there is no posterior density or mode.

]]>> Each value of x describes one truncated-Geometric distribution and vice versa.

I should have said:

> Each value of x describes one truncated-Geometric distribution.

Otherwise, I’m happy with my comment. Although there might be issues with the tie-breaking among the set of xs. Perhaps we should tie break with the *mean*, not the minimum of those xs. I think that works.

]]>Sorry to be persistent, but I’d like to precisely restate my understanding of Freedman’s setup in section 5 of the 1963 paper. At which point below am I incorrect for the first time?

– We generate large finite samples iid from a Geometric(1/4). We call Geometric(1/4) the ‘true distribution’.

– The prior describes a family of distributions, which include the true distribution, but also includes other distributions which are Geometric or truncated-Geometric.

– Regardless of the density or mass that may or may not be associated with each distribution in the prior, I am simply talking about the set of distributions which are supported by the prior.

– The family of distributions which have support in the prior is not so complicated. The family is indexed by a real-valued variable x which ranges from 1/8 to 7/8. Each value of x describes one truncated-Geometric distribution and vice versa.

– For any value of x, we can take the corresponding distribution, which we’ll called Distribution[x], and calculate P(the sample | Distribution[x]).

– This final expression, P(the sample | Distribution[x]), is a real-valued function of x. Distribution[x] is a straightforward distribution with a probability mass function, and the sample is finite, so this function is well defined and easy to understand.

– There will be a value of x (or maybe a set of values of x), which maximizes P(the sample | Distribution[x]). We’ll refer to this value of x as x_hat. If there are multiple such values, we can tie break by selecting the smallest such value.

– This x_hat, as just defined, is the MLE. More precisely the MLE is Distribution[x_hat].

]]>