Confidence Intervals for Misbehaved Functionals
Suppose you want to estimate some quantity which is a function of some unknown distribution . In other words, where maps distributions to real numbers. For example, could denote the median of .
Ideally, a confidence interval based on an iid sample should satisfy
for every distribution . The hard part is finding a non-trivial confidence interval that actual satisfies this condition for every . You can always set but that’s an example of a trivial confidence interval.
In some cases (such as the median) it is possible to find a non-trivial confidence interval. In some cases (such as the mean, when the sample space is the whole real line) it is impossible. Today I want to discuss a paper by David Donoho (Donoho 1988) that discusses, some in-between cases. In these problems, there exist non-trivial, one-sided confidence intervals.
1. Difficult Functionals
Let denote all distributions on the real line. A functional is a map . Consider the following functional: is the number of modes of the density of . (If does not have a density we can define where denotes convolved with a Normal with mean 0 and standard deviation .)
Now look at these two plots:
The density one the left has 2 modes so . The density one the right has 1,000 modes so . Wait! You don’t see 1,000 modes on the second density? The reason is that they are very, very tiny. It’s possible to increase the number of modes drastically without changing the distribution very much. That is, we can find arbitrarily close to but such that . Functionals with this property are very difficult to estimate. In fact, as Donoho proves, no non-trivial two-sided confidence interval exists for such functionals.
More formally, suppose that takes values in . Then we’ll say that is difficult if
Donoho calls this the dense graph condition. Figure 2 of Donoho’s paper explains everything in a nice picture. Basically, if is difficult, you can increase by changing slightly but you can’t decrease it. (Think of the mode example.)
He then proves the following theorem:
Theorem: [Donoho, 1988] Let be a difficult functional. Let . If
In words, if has a non-trivial upper bound for some , then it is has coverage probability 0.
Digression: Rob Tibshirani and I independently proved a similar result. (Tibshirani and Wasserman 1988). We called these bad functionals, sensitive parameters. At the time I was a graduate student at the University of Toronto and Rob was a brand new faculty member there. Just before our paper went to press, David’s paper came out. We managed to add some reference to him when we got the galley proofs. For some reason, the typesetter decided to take every mention of “Donoho” and change it to “Donohue” without consulting us. Thus, our paper has several references to a mysterious person named Donohue. End Digression.
Other examples of difficult functionals are norms , the entropy and the Fisher information .
2. One Sided Intervals
The good news is that we can still say something about the functional . Construct a confidence set for the distribution function. Let be the smallest value of as varies in . We then have that
That is, we get a non-trivial one-sided confidence interval. So we can’t upper bound the number of modes but we can say things like: the 95 percent confidence interval rules out 4 or fewer modes.
What makes this work is that you can’t decrease without changing so much that it becomes statistically distinguishable from the original distribution.
Pretty, pretty cool.
3. Higher Dimensions
Things get rougher in high dimensions. Even in two-dimensions there are problems. Think of a two-dimensional distribution with two well separated modes. So . Now let be identical to except that we add a very thin ridge connecting the two modes. This turns 2 modes into 1 mode. Then but we have decreased . So in this case, even one-sided inference is not possible.
It might be possible to modify so that we can get non-trivial confidence intervals. For example, perhaps we can define in such a way so that, in this last example, is still considered to have 2 modes. This would be a nice project for a graduate student.
Donoho, D.L. (1988). One-sided inference about functionals of a density. The Annals of Statistics, 16, 1390-1420.
Tibshirani, R. and Wasserman, L.A. (1988). Sensitive parameters. Canadian Journal of Statistics, 16, 185-192.