The problem with p-values (again)
I just saw a great quote that reminds me of the post on p-values I wrote a few days ago.
To get it, you just need a little back ground (in case you didn’t read the earlier post).
With a classical hypothesis test, you have a null model that gives you a distribution of outcomes, and to test the hypothesis you make an observation, say
, and you then consider how likely it is to get a value as high or higher as
under this null distribution.
The p-value of
is the probability, under the null distribution, of observing anything higher than
and if this probability is lower than some pre-chosen threshold
, you reject the null hypothesis. Not that you ever observe any outcome higher than
, you just reject the hypothesis if it is unlikely to observe anything higher than
.
The hypothesis is rejected based on how much probability it puts on values larger than
, not on any values actually observed, higher than
.
What the use of P implies, therefore, is that a hypothesis that may be true may be rejected because it has not predicted observable results that have not occurred.
Jeffreys
–
44-67=-23