The recent announcement of the discovery of the Higgs boson brought the inevitable and predictable complaints from statisticians about p-values.
Now, before I proceed, let me say that I agree that p-values are often misunderstood and misused. Nevertheless, I feel compelled to defend our physics friends, and even the journalists, from the p-value police.
The complaints come from frequentists and Bayesians alike. And many of the criticisms are right. Nevertheless, I think we should cease and desist from our p-value complaints.
Here is a list of some of the complaints I have seen together with my comments.
- The most common complaint is that physicists and journalists explain the meaning of a p-value incorrectly. For example, if the p-value is 0.000001 then we will see statements like “there is a 99.9999% confidence that the signal is real.” We then feel compelled to correct the statement: if there is no effect, then the chance of something as or more extreme is 0.000001.
Fair enough. But does it really matter? The big picture is: the evidence for the effect is overwhelming. Does it really matter if the wording is a bit misleading? I think we reinforce our image as pedants if we complain about this.
- The second complaint comes from the Bayesian community that we should be reporting rather than a p-value. Like it or not, frequentist statistics is, for the most part, the accepted way of doing statistics for particle physics. If we go the Bayesian route, what priors will they use? They could report lots of answers corresponding to many priors. But Forbes magazine reports that it cost about 13 billion dollars to find the Higgs. For that price, we deserve a definite answer.
- A related complaint is the people naturally interpret p-values as posterior probabilities so we should use posterior probabilities. But that argument falls apart because we can easily make the reverse argument. Teach someone Bayesian methods and then ask them the following question: how often does your 95 percent Bayesian interval contain the true value? Inevitably they say: 95 percent. The problem is not that people interpret frequentist statements in a Bayesian way. The problem is that they don’t distinguish them. In other words: people naturally interpret frequentist statements in a Bayesian way but they also naturally interpret Bayesian statements in a frequentist way.
- Another complaint I here about p-values is that their use leads to too many false positives. In principle, if we only reject the null when the p-value is small we should not see many false positives. Yet there is evidence that most findings are false. The reasons are clear: non-significant studies don’t get published and many studies have hidden biases.
But the problem here is not with p-values. It is with their misuse. Lots of people drive poorly (sometimes with deadly consequences) but we don’t respond by getting rid of cars.
My main message is: let’s refrain from nit-picking when physicists (or journalists) report a major discovery and then don’t describe the meaning of a p-value correctly.
Now, having said all that, let me add a big disclaimer. I don’t use p-values very often. No doubt they are overused. Indeed, it’s not just p-values that are overused. The whole enterprise of hypothesis testing is overused. But, there are times when it is just the right tool and the search for the Higgs is a perfect example.