Interesting, though you might be interesting in looking at Neyman-Scott discussion in Barndorff-Nielsen, O., and Cox, D. R. Inference and asymptotics. Chapman and

Hall, London, 1994.

Where they argue that, essentially the approaches to salvage the likelihood

separate into two, one is to … and the second replaces the specification of arbitrary

non-commonness of the non-common parameter with a common distribution for that parameter [i.e. latent generative model].

Still unclear to me, what compels one to take certain summaries of the posterior over others…

]]>will take a look and get back to you

LW

in 2007, Marc Toussaint and I wrote a Technical Report about “Bayesian

estimators for Robins-Ritov’s problem” [http://eprints.pascal-network.org/archive/00003871/01/harmeling-toussaint-07-ritov.pdf] which includes simulations

and which also concludes that the critical point is the dependence

or independence of theta and pi. In Section 3 and 4, we considered the

setting X.i ~ uniform(1…C), R.i ~ bernoulli(pi.i), Y.i ~ N(R.i *

theta.i, 1).

(i) In Section 3 we derive a Bayesian estimator that does not assume

dependence between theta and pi (in our report xi). A simulation

shows that this Bayesian estimator has a smaller variance than the

Horwitz-Thompson (HT) estimator on data that has been generated with

independent theta and pi. Thus for such data the Bayesian estimator

has no problems (even has lower variance).

(ii) In Section 4 we assume that theta and pi are dependent. To

derive a Bayesian estimator we have to model this dependence. Of

course there are many possibilities: we choose a dependency that

relates to the dependency used in Robins and Ritov (1997) in their

proof that a likelihood-based estimator can not be uniformly unbiased.

The Bayesian estimator looks quite similar to the HT estimator.

However, on simulated data that follows this model the Bayesian

estimator has again a lower variance. So also for the dependent case,

the Bayesian estimator derived from the model assumptions (dependence

of theta and pi) works. Curiously, it also weights the samples

similar to the HT estimator.

(iii) Section 5 shows that similar arguments hold for continuous X.

My conclusion has three points (which might be obvious by now):

(1) The HT estimator works only good on data where theta and pi are

dependent. The advantage of the HT estimator might be that this

dependence does not have to be made explicit.

(2) If the dependence between theta and pi can be made explicit, we

can derive a Bayesian estimator which works as well as the HT

estimator (possibly with lower variance). The disadvantage of the

Bayesian approach might be that the dependence has to be made

explicit.

(3) The third point is a question: can we exploit a possible

dependence between theta and pi in a Bayesian estimator without making

it explicit?

I’d be curious to hear the experts’ opinion on these thoughts! Thanks!

]]>For the problem to arise, it seems theta(x.i) must be not smooth enough for theta(x.i) ~ theta(x.j) i != j for any (or at least most) i , j where R = 1 (where Y is observed), the interest must be in psy, the E[Y] a _uniform_ expectation over [0,1]^d and pi(x) must be both non-informative (given x) and non-uniform from [0,1]^d when R=1. There does not seem to be a problem about the posterior of theta(x.i) which is simply a mixture of the prior of theta(x.i|x.i.obs,R=1,Y) or prior of theta(x.i|x.i.obs,R=0 )~ prior of theta(x.i|x.i.obs) (i.e. a mixture of posterior and prior with most of it being prior). The problem arises integrating this posterior over [0,1]^d for psy as the target and it is not clear in the blog post how this is done. Simply collapsing the posterior over x.i.obs[just where R=1] (non uniform) would seem very wrong

.

This would explain why pi(x)=1/2 for all x does not cause any problem to arise, x.i.obs will be uniform and why a prior with all the mass on a linear function for theta(x.i) (or any combination of polynomials that is linear in x.i) will not cause a problem – the parameter(s) are the same (e.g. alpha) anywhere in [0,1]^d and a non-uniform sample from [0,1]^d does not create problem if large enough (non-singular).

As for the comment “If you want to be a Bayesian, be a Bayesian but accept the fact that, in this example, your posterior will fail to concentrate around the true value.” given the study design analogy of purposely choosing to sample with pi(x.i), for the objective of estimating psi, given non-smooth theta(x.i), the design is flawed for the usual Bayesian analysis and it perhaps should not be surprising that a fix is not easy to come up with. HT is designed to fix just this problem, and the Bayesian should perhaps not feel bad pronouncing that they can do nothing Bayesian for the patient except to pronounce them dead.

So this goes in my bin of stuff that is ignorable for the practicing statistician (unlike Neyman-Scott).

Or I may still not unstand their example well enough….

]]>Yes, in the original post we showed how to get a confidence interval that shrinks

at rate . Again, this is possible because of uniform consistency.

—LW

This sounds very interesting.

I would like to see a prof that in this example

it yields an estimator that is uniformly

consistent.

—LW

]]>If one takes the view that one’s prior ought to be a computable approximation to the Solomonoff prior, these kinds of Freedmanesque inconsistency arguments against Bayes don’t actually militate against Bayes. They are in fact *incredibly useful* — they show that vast swaths of the space of prior probability distributions can be disregarded, since they do not contain computable approximations to the Solomonoff prior.

You’re right to point out that Bayesians will see probabilities and

frequencies as diferent (although linked via deFinetti’s theorem).

Nonetheless, we consider it reasonable to ask about the frequency

behavior of posteriors probability distributions. Perhaps it would

clearer if we said: one’s posterior beliefs will fail to concentrate

around the truth, in the frequency sense.

I didn’t find your argument that W(theta) should be a function of pi to

be convincing. But anyway, dependence on pi is not enough.

It is necessary but not sufficient.

—LW