There are some so-called principles of statistical inference that have names like, the Sufficiency Principle (SP), the Conditionality Principle (CP) and the Likelihood Principle (LP).
Birnbaum (1962) proved that CP and SP imply LP. (But see Mayo 2010). Later, Evans Fraser and Monette (1986) proved that CP alone implies LP (so SP is not even needed).
All of this generates controversy because CP (and SP) seem sensible. But LP is not acceptable to most statisticians. Indeed, all of frequentist inference violates LP, so if we adhered to LP we would have to abandon frequentist inference. In fact, as I’ll explain below, LP pretty much rules out Bayesian inference contrary to the claims of Bayesians.
How can CP be acceptable and LP not be acceptable when CP logically implies LP?
The reason is that the principles are bogus. What I mean is that, CP might seem compelling in a few toy examples. That doesn’t mean it should be elevated to the status of a principle.
1. The Principles
SP says that: if two experiments yield the same value for a sufficient statistic, then the two experiments should yield the same inferences.
CP says that: if I flip a coin to choose which of two experiments to conduct, then inferences should depend only on the observed experiment. The fact that I could have chosen the other experiment should not affect inferences. In more technical language, the coin flip is ancillary, (its distribution is completely known), and inferences should be conditional on the ancillary.
LP says that: two experiments that yield proportional likelihood functions should yield identical inferences.
Frequentist inference violates LP because things like confidence intervals and p-values depend on the sampling distributions of estimators and so on, which involves more than just the observed likelihood function.
Bayesians seem to embrace LP and indeed use it as an argument for Bayesian inference. But two Bayesians with the same likelihood can get different inferences because they might have different priors (and hence different posterior distributions). This violates LP. Whenever I say this to people, the usual reply is: but Birnbaum’s theorem only applies to one person at a time. But this is not true. There is no hidden label in Birnbaum’s theorem that says: Hey, this theorem only applies to one person at a time.
2. CP Is Bogus
Anyway it doesn’t matter. The main point is that CP (and hence LP) is bogus. Just because it seems compelling that we should condition on the coin flip in the simple mixture example above, it does not follow that conditioning is always good. Making a leap from a simple, toy example, to a general principle of inference is not justified.
Here is a simple example. I think I got it from Jamie Robins. You observe
and . To be concrete, let’s say that but each is a vector of length and is huge; for example. We want to estimate . This is just linear regression with a large number of covariates.
Suppose we have some extra information: we are told that the covariates are independent. The “best” estimator (the maximum likelihood estimator) is obtained by conditioning on all the data.
This means we should estimate the vector by least squares. But, the least squares estimator is useless when . We could regularize by putting a penalty or a prior. But the resulting estimators will have terrible behavior compared to the following “anti-conditioning” estimator. Just throw away most of the data. In particular, throw away all the covariates except the first one. Now do linear regression using only and the first covariate. The resulting estimator is then tightly concentrated around with high probability. (This is because of the independence.)
In this example, throwing away data is much better than conditioning on the data. We are heavily violating LP.
There are lots of other examples of great procedures that violate LP. Randomization is a good example. Methods based on randomization (such as permutation tests) are wonderful things but adherents to CP (and hence LP) are precluded from using them. The same applies to data-splitting techniques.
The bottom line is this: if we elevate lessons from toy examples into grand principles we will be led astray.
Postscript: Since I mentioned Jamie, I should alert you that in the near future, I’ll be cross-posting on this blog and Cosma’s blog about a debate between me and Jamie versus a Nobel prize winner. Stay tuned.
Evans, M.J., Fraser, D.A.S. and Monette, G. (1986). On principles and arguments to likelihood. Canadian Journal of Statistics, 14, 181-194.
Birnbaum, Allan (1962). “On the foundations of statistical inference”. J. Amer. Statist. Assoc., 57 269-326.
Mayo, D. (2010). An Error in the Argument from Conditionality and Sufficiency to the Likelihood Principle. In Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D Mayo and A. Spanos eds.), 305-14.