Psych Journal Bans Hypothesis Testing and Confidence Intervals

In a recent editorial, the scientific journal Basic and Applied Social Psychology have banned submissions that utilize null hypothesis significance testing (NHST) or confidence intervals (CI). Firstly, I can certainly appreciate what motivated the journal’s policy as I routinely read/review manuscripts that solely base decision making on p-values less than 5%, i.e., the (incorrect) belief research findings are important if p<0.05.

For example, I recently read a case-control study of helmet use and bicycle-related trauma by Heng et al.[1] One of their reported measures was alcohol use and they found “[a]lcohol consumption did not correlate with…helmet wearing” and report a p-value of “NS” (as in, not significant). A summary of that data is given below.

Helmet Use
Alcohol Involvement Yes No
Yes 0 18
No 17 125

The p-value from this table is certainly greater than 5% by Fisher’s exact test. For a measure of effect size, it isn’t possible to compute an odds ratio in the usual way. Instead, a continuity corrected version can be computed by adding a half to each cell.

OR_{cc}=\dfrac{(a+0.5)(d+0.5)}{(b+0.5)(c+0.5)}

The above 2\times2 table gives a continuity corrected odds ratio of OR_{cc}=0.19. In other words, there is an associated 81% reduction in the odds of alcohol involvement among helmet wearers versus those not wearing a helmet in this data set. That is not a trivial result and, perhaps, this is the type of situation the journal is trying to avoid. That is something I fully support.

On the other hand, I believe the journal has over-reached by banning NHST and CI. Granted there are problems with these methods of statistical inference, but I strongly disagree these methods are invalid. Perhaps the strengths and limitations of these methods are poorly understood, but they are certainly not invalid.

The journal does recommend an increased focus on descriptive statistics and correctly note they “become increasingly stable” as sample size increases. Of course, this does rely on descriptive statistics that are unbiased (at least asymptotically), otherwise you’ll get stable estimates that do not reliably estimate what you want estimated.

I also think there is a disconnect between the push for descriptive statistics and the ban on confidence intervals. So, for example, I may be encouraged to report the sample mean and its standard error

\bar{x}   and   \dfrac{s}{\sqrt{n}}

but I’m banned from combining these results as

\bar{x}\pm{1.96}\times\dfrac{s}{\sqrt{n}}.

The American Statistical Association is aware of this situation and has formed a committee to comment on the journal’s decision. I look forward to reading what they come up with.

  1. Heng KWJ, Lee AHP, Zhu S, Tham KY, Seow E. (2006) Helmet use and bicycle-related trauma in patients presenting to an acute hospital in Singapore. Singapore Med J 47(5): 367-372.