## LOST CAUSES IN STATISTICS I: Finite Additivity

LOST CAUSES IN STATISTICS I: Finite Additivity

I decided that I’ll write an occasional post about lost causes in statistics. (The title is motivated by Streater (2007).) Today’s post is about finitely additive probability (FAP).

Recall how we usually define probability. We start with a sample space ${S}$ and a ${\sigma}$-algebra of events ${{\cal A}}$. A real-valued function ${P}$ on ${{\cal A}}$ is a probability measure if it satisfies three axioms:

(A1) ${P(A) \geq 0}$ for each ${A\in {\cal A}}$.

(A2) ${P(S)=1}$.

(A3) If ${A_1,A_2,\ldots}$ is a sequence of disjoint events then $\displaystyle P\Bigl(A_1 \bigcup A_2 \bigcup \cdots \Bigr) = \sum_{i=1}^\infty P(A_i).$

The third axiom, countable additivity, is rejected by some extremists. In particular, Bruno de Finetti was a vocal opponent of (A3). He insisted that probability should only be required to satisfy the additivity rule for finite unions. If ${P}$ is only required to satisfy the additivity rule for finite unions, we say it is a finitely additive probability measure.

Axioms cannot be right or wrong; they are working assumptions. But some assumptions are more useful than others. Countable additivity is undoubtedly useful. The entire edifice of modern probability theory is built on countable additivity. Denying (A3) is like saying we should only use rational numbers rather than real numbers.

Proponents of FAP argue that it can express concepts that cannot be expressed with countably additive probability. Consider the natural numbers $\displaystyle S = \{1,2,3,\ldots, \}.$

There is no countably additive probability measure that puts equal probability on every element of ${S}$. That is, there is no uniform probability on ${S}$. But there are finitely additive probabilities that are uniform on ${S}$. For example, you can construct a finitely additive probability ${P}$ for which ${P(\{i\})=0}$ for each ${i}$. This does not contradict the fact that ${P(S)=1}$ unless you invoke (A3).

You can also decide to assign probability 1/2 to the even numbers and 1/2 to the odd numbers. Again this does not conflict with each integer having probability 0 as long as you do not insist on countable additivity.

These features are considered to be good things by fans of FAP. To me, these properties make it clear why finitely additive probability is a lost cause. You have to give up the ability to compute probabilities. With countably additive probability, we can assign mass ${p(s)}$ to each element ${s\in S}$ and then we can derive the probability of any event ${A}$ by addition: $\displaystyle P(A) = \sum_{s\in A}p(s).$

This simple fact is what makes probability so useful. But for FAP, you cannot do this. You have to assign probabilities rather than calculate them.

In FAP, you also give up basic calculation devices such as the law of total probability: if ${B_1,B_2,\ldots,}$ is a disjoint partition then $\displaystyle P(A) = \sum_j P(A|B_j) P(B_j).$

This formula is, in general, not true for FAP. Indeed, as my colleagues Mark Schervish, Teddy Seidenfeld and Jay Kadane showed, (see Schervish et al (1984)), every probability measure that is finitely but not countably additive, exhibits non-conglomerability. This means that there is an event ${E}$ and a countable partition ${B_1,B_2,\ldots,}$ such that ${P(E)}$ is not contained in the interval $\displaystyle \Bigl[\inf_j P(E|B_j),\ \sup_j P(E|B_j)\Bigr].$

To me, all of this suggests that giving up countable additivity is a mistake. We lose some of the most useful and intuitive properties of probability.

For these reasons, I declare finitely additive probability theory to be a lost cause.

Other lost causes I will discuss in the future include fiducial inference and pure likelihood inference. I am tempted to put neural nets into the lost cause category although the recent work on deep learning suggests that may be hasty. Any suggestions?

References

Schervish, Mark, Seidenfeld, Teddy and Kadane, Joseph. (1984). The extent of non-conglomerability of finitely additive probabilities. Zeitschrift f\”{u}r Wahrscheinlichkeitstheorie und Verwandte Gebiete, Volume 66, pp 205-226.

Streater, R. (2007). Lost Causes in and Beyond Physics. Springer.

1. jimmy
Posted July 1, 2013 at 1:54 am | Permalink

regarding neural nets, what about still writing about how and why you would have placed neural nets in the lost causes category before? i would like to read some on what you and others have found problematic about them. and then follow up (in either the same post or a separate one) with how they may yet be useful with regard to deep learning?

• normaldeviate
Posted July 1, 2013 at 9:01 am | Permalink

ok good idea

• rj444
Posted July 2, 2013 at 10:38 am | Permalink

I would be interested in this too.

2. q
Posted July 1, 2013 at 10:16 am | Permalink

Interesting but it’s not clear what the benefits of FAP are besides being able to define a new kind of distributions without purpose.

A typo here “probability 1/2 to the even numbers and 1/2 to the odd”?

3. Keal
Posted July 1, 2013 at 4:27 pm | Permalink

Edwin T. Jaynes also has a good discussion about finite and countable additivity in his book.

A draft version of the chapter “Paradoxes in Probability” can be found here here: http://omega.albany.edu:8008/ETJ-PDF/cc15b.pdf (page 1512 to 1514)

4. Keal
Posted July 1, 2013 at 4:31 pm | Permalink

PS: Jaynes is clearly on the side of Kolmogorov and countable additivity. Quote: “Those who commit the sin of doing reckless, irresponsible things with infinity often invoke the term ‘finite additivity’ to make it sound as if they are being more careful than others with their mathematics”.

5. David Rohde
Posted July 2, 2013 at 2:06 am | Permalink

This 2001 paper makes the case for finite additivity and takes up arguments made by Kadane. Not everyone thinks its a lost cause.

Michael, Goldstein, Avoiding foregone conclusions: geometric and foundational analysis of paradoxes of finite additivity. Journal of Statistical Planning and Inference Volume 94, Issue 1, 1 March 2001, Pages 73–87

I don’t claim to understand this debate at all….

Goldstein’s argument distinguishes between posterior probabilities and conditional probabilities.

I do however think that the core criticism made of Bayesian statistics applies specifically to fully specified Bayesian statistics focused upon conditioning. While I think (some of) this criticism is valid, I think that a partially specified version which obviously must sacrifice conditioning (most of the time) will weather these criticisms much better….

6. Keal
Posted July 2, 2013 at 5:41 am | Permalink

I don’t think Kadane is siding in favor of finite addivity. In his recent book “Principles of Uncertainty” he devotes the whole chapter 3 on this topic (see

). His conclusion (p104) is that “from a foundational point of view .. both finite and countable additivity are worth exploring”. He also thinks that Goldstein actually supports countable additivity (p105).

7. Pietro Rigo
Posted July 2, 2013 at 6:11 am | Permalink

RECOVERED CAUSES

Certainly, axioms are not right or wrong but are just working assumptions. As a consequence, when evaluating an axiom,
both its practical utility and its conceptual meaning should be taken into account. No one would accept a theory based
on a very useful but untenable axiom.

Now, we do not think that countable additivity is untenable, but (as far as we know) it lacks any conceptual motivation
other than its practical utility. A.N. Kolmogorov, among others, agrees with this view.

We do not try here to make a comprehensive list of the merits and drawbacks of finitely additive probabilities (f.a.p.’s).
We would be terribly boring. We just mention, about the merits:

(i) Unlike $\sigma$-additive probabilities, f.a.p.’s have a solid conceptual motivation: de Finetti’s coherence principle.

(ii) Unlike $\sigma$-additive probabilities, f.a.p.’s can always be extended to the power set. This avoids several (inessential)
measurability issues.

(iii) There are a number of problems which cannot be solved in a countably additive framework, while admit a finitely additive
solution. Well known examples are improper priors and the first digit problem, but they are not the only. A plenty of other examples
occur in conditional probability, decision theory, mathematical finance, stochastic integration, number theory, and so on.

In front of (i)-(ii)-(iii), f.a.p.’s have essentially only one (even if big) drawback: a number of familiar results are no longer
true for f.a.p.’s and uniqueness of certain procedure fails. Under this point of view, Larry Wasserman is right: life is certainly
harder (even if intriguing) in a finitely additive setting.

We finally make a remark and an example. Both are well known but {\em repetita iuvant}.

The remark is that, on accepting finite additivity as an axiom, one is free to use $\sigma$-additive probabilities.
Merely, one is not obliged to do so. $\sigma$-additivity becomes an assumption rather than an axiom. For instance, we believe that
finite additivity is a sound axiom, but we actually assumed $\sigma$-additivity in most of our papers.

The example is the following. An urn contains $w$ white balls and $b$ black balls, but both $w$ and $b$ are unknown. We are interested
in the proportion $p=w/(w+b)$ and we feel that $p$ should have a uniform distribution (in some sense). Let $S=Q \cap [0,1]$ be
the rationals of $[0,1]$ and let $U$ be the usual uniform distribution on the Borel sets of $[0,1]$. If we choose $U$ as a (prior)
distribution for $p$, we obtain $U(S)=0$ even if $p\in S$. This looks a paradox, there is little to say, even if $S$ is dense in $[0,1]$.
Instead, if one allows for f.a.p.’s, one can assess
$P([0,x] \cap S) = x, 0 \leq x \leq 1,$
for some f.a.p. $P$. In particular, $P(S)=1$. Now, for this particular problem, what is the best solution ?
And if the best solution is $P$, why to discard it for axiomatic reasons only ($P$ fails to be $\sigma$-additive) ?
Who is the {\em extremist}, the one using $P$ or the one using $U$ (aware that $U$ is the wrong solution) for axiomatic reasons only ?

Patrizia Berti, Eugenio Regazzini, Pietro Rigo

• Christian Hennig
Posted July 4, 2013 at 11:42 am | Permalink

Good call! (I’m not very passionate about this particular issue but I like to see a quality rejoinder to Larry’s well argued posting I felt a bit sceptical about.)

• Jorgen Harmse
Posted August 15, 2013 at 4:36 pm | Permalink

The example of the urn is interesting, but assumes that w+b is unbounded (moreover, that P(w+b > N) = 1 for all N), which is possible under finite additivity, but goes against the apparent practical focus.

A more interesting application (to me) is the long-run behaviour of primes. (Certainly any particular prime is negligible, so sigma-additivity is out of the question.) It would be nice to say that if p is a randomly selected prime then P(p congruent to 1 modulo 6) = 1/2. Number theorists use various definitions of density which are finitely additive on subalgebras of the power set of the set of primes. (I think this includes all sets determined by solvability of polynomial equations modulo the prime.) Does the theory of finitely additive probabilities have something interesting to say about this?

• pietrorigo
Posted September 11, 2013 at 3:43 pm | Permalink

Dear prof. Harmse,
we apologize for answering with so big delay, but we are not familiar with normal deviate and we learned of your remarks just now.

You are looking for a finitely additive probability P on the power set of the
primes such that

(*) P(p: p is congruent to 1 modulo 6) = 1/2.

But, in addition to (*), what other properties of P are desirable for you ?
And also, you say that such a P already
exists on certain subalgebras. So, clearly, it can be extended to the
power set. What’s the problem ? Perhaps, such
subalgebras do not include certain sets of interest for you ?
Or what else ?

Patrizia Berti Eugenio Regazzini Pietro Rigo

• Jorgen Harmse
Posted September 14, 2013 at 3:35 pm | Permalink

Thank you for looking into this. I haven’t followed the latest scholarship (so it’s even possible that the question has already been answered), but here is my understanding. I apologise for making at least one statement that I’m not sure is true.

There are already several notions of the density of a set of primes. The most obvious is natural density: d(S) = lim_{x\to\infty} {number in S <= x \over number of primes <= x}. The limit exists for many interesting sets (including the example I gave) and the behaviour under finite disjoint union is obvious, but the collection of sets for which the limit exists is not an algebra. (It is easy to construct A & B with d(A) = d(B) = 1/2, liminf_{x\to\infty} {number in A intersect B <= x \over number of primes <= x} = 0, and limsup_{x\to\infty} {number in A intersect B <= x \over number of primes <= x} = 1/2.) Is there an extension of d to a finitely additive function on an algebra?

Number theorists are particularly interested in sets of primes determined by algebraic equations, for example the set of primes p such that x^3+x+1=0 has a solution modulo p. There is a notion of density which applies to these sets, and I assumed in my previous note that they formed an algebra. The collection is obviously closed under finite union (disjoint or not), but I'm not sure about complements. Contrary to what I said before, it is not obvious to me that any of the usual notions of density is defined on a reasonably interesting algebra.

I'm puzzled by your statement that a measure defined on an algebra automatically extends to the power set. I don't know the extension theorems for finitely additive probabilities, but the countably additive extension of Lebesgue measure to the power set of the real line is non-obvious. A reasonably large subalgebra of the power set of the set of primes would be interesting.

Finally, once such a finitely additive measure is constructed, does it help number theorists in some way? (My experience is limited to groupings of the prime factorisation of n! and other highly divisible numbers. For a Diophantine equation like n! = P(x)^k Q(x), where P & Q are polynomials, Daniel Berend & I considered those primes which could contribute to the factor P(x)^k, and were sometimes able to show that there are at most finitely many solutions.) I know the question is vague, but I am new to this: we all know that set functions can be finitely additive without being countably additive, but it is not clear to me how much study such functions deserve.

• pietrorigo
Posted September 18, 2013 at 4:32 pm | Permalink

Dear Jorgen,
let X be a set, D any class of subsets of X, and P a real function on D. If P is coherent (i.e., P satisfies de Finetti’s coherence principle) then P admits a coherent extension to the power set of X. This follows easily by Hahn-Banach theorem. Also, if D is an algebra, P is coherent if and only if it is a finitely additive probability. Finaly, if X = {1,2,….}, P is the usual density, and D is the collection of those subsets of X such that the limit exists, then P is coherent. In fact, coherence is preserved under pointwise limits, and P is actually the pointwise limit of the restrictions to D of the coherent mappings

P_n(A) = (1/n) card( A intersection {1,…,n} )

• Jorgen Harmse
Posted September 19, 2013 at 6:41 pm | Permalink

Thank you: I now understand your previous remarks. Applying the Banach-Alaouglu theorem to P_n, we obtain finitely additive P defined on the whole power set with liminf_{n\to\infty} P_n <= P <= limsup_{n\to\infty} P_n . In particular, P(A) = lim_{n\to\infty} P_n(A) if the limit exists. I think similar arguments (or at least your argument using the Hahn-Banach theorem) apply to the more sophisticated definitions of density.

I'm still not sure what this tells us about the distribution of primes satisfying various algebraic conditions. Perhaps my original question was a red herring.

8. alex
Posted July 2, 2013 at 5:55 pm | Permalink

I’d be really interested in reading something on purposeful selection of treatment and controls, rather than randomisation. It is pretty much a dead idea now, but people like savage and student championed it.

• normaldeviate
Posted July 2, 2013 at 6:02 pm | Permalink

can you point me to a reference?

• RP
Posted July 8, 2013 at 7:17 am | Permalink

“This concept [randomization] which stood at the center of Fisher’s approach to experimental design
did not find favor with everyone. In particular, Fisher’s friend Gosset argued that
balanced systematic designs were preferable.”

Erich L. Lehmann
Fisher, Neyman, and the Creation of Classical Statistics
Section 5.7.1 Randomization

“Fisher could be polemical and arrogant. He quarrelled with… Gosset and others on
random versus systematic arrangements of experiments…”

Anders Hald
A History of Parametric Statistical Inference from Bernoulli to Fisher, 1713 to 1935
page 144, paragraph 7

9. Mayo
Posted July 2, 2013 at 9:40 pm | Permalink

Uninformative priors perhaps?

10. Physicist
Posted July 12, 2013 at 5:24 am | Permalink

Question: are Dirac delta functions compatible with countable additivity? It seems to me that, like improper priors, they are not. Anybody can comment on this?

• normaldeviate
Posted July 12, 2013 at 8:46 am | Permalink

Yes.
Point mass distributions are countably additive.

11. Teddy Seidenfeld
Posted July 15, 2013 at 5:19 am | Permalink

Dear Friends,

Regarding Larry’s two lost causes (FAP and Improper Priors), as is frequently reported in the Stats. literature, they are linked in that “improper priors,” though sigma-finite measures, normalize to merely finitely (and not countably) additive probabilities. Hence, one anticipates that “proper posteriors” derived from “improper priors” may nonetheless display some of the anomalous features (e.g., non-conglomerability) associated with the merely finitely additive joint distributions that they share.

The comment I’d like to add to this discussion is that non-conglomerability is not so much associated with (merely) finitely additive probabilities, but rather with the theory of conditional probability that one adopts. If YOUR conditional probabilities are of the deFinetti/Krauss/Dubins kind, then if YOUR unconditional probability function is countably additive, conglomerability is assured in countable partitions, but not so in partitions of larger cardinality.

The point is that if you use a deF/K/D styled conditional probability, even if YOUR unconditional probability is countably additive, unless it is perfectly additive (as explained in the next sentence), YOUR conditional probabilities will suffer non-conglomerability in some uncountable partition that matches the non-additivity of YOUR unconditional probability.

A probability P(dot) is perfectly additive just in case each union of null-events is null.

Ordinary (continuous) countably additive probability, e.g. Uniform on [0,1] — with the algebra being Lebesgue measurable sets — is continuum non-additive — since a continuum union of null events (the points) is not null — it is the sample space with prob 1, in fact.

Then, it turns out, there is a partition of size the continuum where YOUR deF/K/D conditional probs fail conglomerability!

In the case where YOUR (unconditional) prob is merely finitely additive and not countably additive, the failure of conglomerability occurs in some countable partition, as Jay (Kadane), Mark (Schervish) and I showed 30 years ago.

For more on this please see our recent (ISIPTA-2013) paper, Two theories of conditional probability and non-conglomerability, linked at http://www.sipta.org. This paper covers the mathematically ‘easy’ case that applies with a commonsense interpretation of conditional probability from a continuous countably additive prob. It is the relevant case for Stats., in my opinion.

The upshot of all this mathematical fantasy is that the trouble over non-conglomerablity is in the theory of conditional probability that YOU adopt, rather than solely whether YOUR (unconditional) probability is countably additive or merely fintely additive. Use deF/K/D conditional probabilities and YOU are stuck with non-conglomerability unless Your P(dot) is perfectly additive. Use the familiar theory of regular conditional distributions and you can have conglomerability, provided that the rcd’s exist, that you don’t mind the ‘Borel Paradox,” etc., etc.

Regards to all,
Teddy (Seidenfeld)

12. Keal
Posted July 15, 2013 at 12:11 pm | Permalink

another suggestion for a “lost cause in statistics”: how about “imprecise probabilities”?

at least if you view probabilites as state of knowledge (i.e. as epistemological rather than ontological) then imprecise probabilities do not make much sense …

13. THE BEAST OF MARS
Posted July 24, 2013 at 8:21 pm | Permalink

Dear stats master,

Here’s another funny one. The dual of $L^\infty(\mu)$ is the Banach space of finitely additive finite signed measures absolutely continuous wrt $\mu$ with total variation norm. On the one hand, this makes it useful to have an understanding of finitely additive measures; on the other hand, this gives yet another reason to avoid the space $L^\infty(\mu)$.

Also, with all due respect, I am not sure I find your negative example very convincing, since it works with a countable partition (i.e., the problem seems adapted to countably additive measures).

Bes twishes

• Jorgen Harmse
Posted August 15, 2013 at 4:57 pm | Permalink

Good example. I’m reminded of substitute spaces in Harmonic Analysis. For example, the $L^p$ bound for the Hardy-Littlewood maximal function fails for $p=1$, but this is the place to focus. (A weak-type estimate at $p=1$ combined with the obvious result for $p=\infty$ and an interpolation theorem yields the general result for $1<p<\infty$.) For some other purposes we might replace $L^\infty$ with the space of bounded continuous functions, whose dual is much nicer.