It was re-posted at the R-bloggers: http://www.r-bloggers.com/on-p-value/

]]>Have to agree with these

“high cost of refusing to think about the operating characteristics

of these posterior probabilities.”

“The conceptual simplicity of Bayes is an artefact of sweeping difficult problems under the rug.”

(Both in the choice of prior and how the posterior is _used_)

I am very glad to see you make this point: I think it /is/ important that we preserve the distinction between ‘;’ and ‘|’. A related point is that confusion arises when people try to make the probability measure ‘conditional on all available information, H’, and write p(X | H), or something similar. Clearly, H is the inferential basis for p, and is /not/ a random quantity within p. Logical (or necessary) Bayesians have the best reason for writing p(X | H), I suppose, but in this case p would have to be the primeval probability measure.

I inform my students that Frequentist statisticians treat theta as the index of a family of distributions, hence writing X ~ f_X(x ; theta), but that Bayesian statisticians are comfortable treating theta itself as a random quantity, and are thus able to interpet f_X(x ; t) as p(X = x | theta = t). I don’t think it is helpful to write f_X(x | t) but I do write f_{X | theta}(x | t) as I tend not to use p() — it’s a bit inky but at least it is clear.

Jonathan.

]]>Erik,

“What is the real problem which if A implies B and B states that your statistic X has a certain probability measure P, writing p(X) = P(X|B)? ”

1. A first problem is when B states a family of distributions for your statistics X rather than a unique distribution for X, let’s call this family by F_B. There will be only one distribution if B states a singleton or if your statistic X was ancillary to that family implied by B (this happens asymptotically under mild regular conditions and happens also under normality). Once this is understood, you may not write, in general, that p(X) = P(X|B) or p(X) = P(X; under B) since this equality is not always well defined. In the parametric context, let’s suppose that B states that theta lies in a null set Theta_B, then it should be write as:

p(X) \in F_B, where F_B = {P_theta, theta \in Theta_B }

and you may want to choose the most conservative p-value taking the sup over F_B.

Note that, if Theta_B = { b }, then the following equality is well defined: p(X) = P_b(X). But this is only valid when Theta_B has one element or when our statistics X is ancillary to F_B.

2. Let’s assume that Theta is the real line and our null hypothesis is Theta_B = { b }, if you write p(X) = P(X| theta = b) you are implicitly saying that there are different probability spaces for the full and the null parameter spaces. As Theta_b is a singleton, we have a problem in this definition since “P(X| theta = b)” does not correspond exactly to the conditional probability definition. What is the probability of A given B when P(B) = 0? as far as I know it is not well defined. If we define our p-value by using an ill-defined definition, we will be contributing to more controversies on this subject.

3. Let’s suppose that our statistic X is ancillary to the null family of probability measures F_B. The same problem described in 2. happens here in the writing “p(X) = P(X|B)”.

My suggestion is: if you are a Bayesian and wants to explain what is a p-value you should not use prior probabilities to do this, instead you can use prior possibilities. That is, you must do an effort for not assigning any probability distributions for the full and null parameter spaces.

Best,

Alexnadre.

What is the real problem which if

A implies B

and B states that your statistic X has a certain probability measure P,

writing p(X) = P(X|B)? The theorem of Bayes does not apply, but this is clear since we do not have a probability measure on B. This includes the case when A is itself governed by a probability measure in which it coincides with the usual definition. In fact, this is what I would I prefer. I don’t see the point of introducing hypothetical pseudo probability at all.

Frankly, I get the point but I don’t see the importance. What is the argued danger of this misunderstanding? Mayo mentions the prosecutor’s fallacy, but I don’t the relevance here. This seems to be caused more by the misunderstanding originally commented on by Andrew in his blog. You can resolve it both in a Bayesian way, by thinking about what your prior should look like or in a frequentist way, by noticing that you actually look at a sample of people dragged before the court and not a random sample of the population. The problem is only that the null hypothesis is not just that the accused is innocent but that the accused is innocent and stands accused of a crime anyway.

And this resolves the fallacy quite nicely, if we think of a DNA match even if expressed as “conditional probabilites”:

P(DNA match|Innocent is small) is low and not relevant.

P(DNA match|Innocent and dragged before court based on random DNA matching) is high, but

P(DNA match|Innocent but was identified by victim) is again low. Both make sense.

Mistaking the p-value for the probability of the null hypothesis being true is clearly dangerous. That’s important. But the discussion here seems to me just to be a quibble.

]]>Entsophy,

Maybe we agree in one thing: we cannot use only measure theory to the foundations of statistics. As you can see, possibility measures are not part of the classical measure theory. However, if we define properly a p-value, we see that it is an induced measure from a statistics T that is build for a null set Theta0. The implications of it are deduced from measure theory.

I think that general belief functions are fully applied to the foundations of statistics, but any statisticians devote attention to them.

]]>