How Statistical Decision Theory
Developed a 5% Criterion
(1 chance in 20)

Here's the history of where the criterion came from. I picked this up from a presentation Chris Spatz made at teaching conference. I've lost my notes and haven't tracked the reference yet. But here's the content:

Sir Ronald Fisher published Statistical Methods for Research Workers in 1925. The last edition was in 1973. His rule of thumb from the first edition remained unchanged. This was the birth of "p < .05" rule. It was stated as a very clear notion with no reasons given what so ever.

British Agriculture at the end of the nineteenth century was tremendously productive. A British farmer was getting 30 bushels of wheat per acre when farmers in the United States were getting 14. The British government devoted quite a bit of money to agricultural research to maintain and improve that productivity. As well they should: it was the productivity of the British farmer that supported the creation and growth of the British Empire.

Here's the conflict. Pretend I am an agricultural scientist. At a research farm I try a new farming technique on a sample of 8 plots of ground and discover the average yield was 35 bushels rather than the 30 bushels farmers are currently getting.. Should I tell British farmers to try this new technique to increase their yield? Maybe, maybe not.

What I'll do is arbitrarily decide that only differences that fall in the outer 5% of the sampling distribution are to be considered real. All smaller differences I'm going to decide are merely random fluctuations from the same value I have now.

So, what it boils down to was a guess on the part of early users of statistics as to how often could they make a Type-I error and still remain credible.

Click here for a continuation with details worked out

Further notes from Spatz's presentation

The conception of "p < .05" was probably with Karl Pearson (from a 1982 American Psychologist article). Pearson was an arch enemy of Ronald Fisher. Both demanded loyalty from friends. Only Gossett (the "Student" of the t-distribuiton) seemed to remain friends with both men. In 1900 Pearson was developing the chi-squared test for goodness of fit to a theoretical distribution. In his paper he said a probability of 0.10 was not a very improbable fit but a probability of .01 was a very improbable fit.

In 1906 Gossett was working with Pearson on the probable error of the mean (now the SEM). They adopted a criterion of "2 SD apart" in a 1908 paper. Remember, only 6% of the distribution is further from the central tendency than 2 standard deviations.

In 1905, the Journal of Agricultural Science was established in Britian. There had been a lot of applied agricultural research and data from 1850 on. In 1910 Wood (?) presented an article on "The Interpretation of Experimental Results". Where are we going to make this decision. Odds of 30:1 are "practical certainty" (p = .03).

Wood wanted to advise farmers who were already the best in the world of how to get better [wheat production was 30 bushels/acre in England; 14 in the USA; 7 in Argentina]. He evidently felt that you want your recommendation to make an improvement. If I don't make a recommendation, they'll be OK. If I do make a recommendation, they should be better. Its important to my science and my credibility that my recommendations work. "0.05" seems like a reasonable tradeoff.

© 2002 by BurrtonWoodruff. All rights reserved. Modified Sunday, March 25, 2007