# Confidence Interval Based Sample Size for Two Means

A colleague recently inquired about computing sample size when comparing two group means. My initial response was to compute sample size based on the two sample t-test. After a few probing questions it became clear the colleague did not expect the means to differ and demonstrating they were similar was the aim of the study.

There are two primary methods for computing sample size — one is based on hypothesis testing and the other on confidence intervals. The goal of hypothesis testing is to demonstrate differences in means (or proportions, variances, etc) which is the antithesis of what the colleague wanted. Sample size based on a confidence interval is a much better option here.

The large sample confidence for the difference in two population means $\mu_1$ and $\mu_2$ is

$\bar{x}_1-\bar{x}_2{\pm}z_{1-\alpha/2}\sqrt{\dfrac{s^2_1}{n_1}+\dfrac{s^2_2}{n_2}}$

where $z_{1-\alpha/2}$ is the quantile of a normal distribution corresponding to confidence level $(1-\alpha)100\%$ and $\bar{x}_i$, $s^2_i$ and $n_i$ are the sample mean, sample variance and sample size respectively for group $i=1,2$.

The right side part of this equation is often termed the margin of error $E$, i.e.,

$E=z_{1-\alpha/2}\sqrt{\dfrac{s^2_1}{n_1}+\dfrac{s^2_2}{n_2}}$

This formula can be simplified if it’s reasonable to assume a common variance, i.e., $s^2=s^2_1=s^2_2$, and equal sample sizes, i.e., $n=n_1=n_2$. The equation simplifies to

$E=z_{1-\alpha/2}\sqrt{\dfrac{2s^2}{n}}$

We can then solve for $n$ to get

$n=2\left(\dfrac{z_{1-\alpha/2}s}{E}\right)^2$

As an example, the sample size (per group) for a 95% confidence interval with a $E=20$ unit margin of error and standard deviation $s=100$ is

$n=2\left(\dfrac{1.96\times{100}}{20}\right)^2=192.08$

In practice, this would be rounded up to $n=193$ per group. It is common to choose 95% confidence (i.e., $z_{0.975}=1.96$) whereas the margin of error and standard deviation are context specific. One strategy for the margin of error is to choose the smallest value that represents a meaningful difference, so that any smaller value would be considered inconsequential. The choice of standard deviation can be informed by previous research.

Another consideration would be loss to follow up (if perhaps the outcome was the difference pre- and post-measurements). So, that with say a 20% attrition rate, the sample size per group would be increased to

$n=\dfrac{192.08}{1-0.2}\approx241$

Of course, the computation gets far more complex, and possibly intractable, when the equal variance and sample size assumptions are not reasonable.

# Too much statistical power can lead to false conclusions

We recently published a letter in the journal Injury Prevention regarding a criticism of our re-analysis of Ian Walker’s bicycle overtaking study. To fit within the journal’s guidelines, we needed to shorten our original draft by a sizeable amount. A lot of important detail was omitted in the process, so I’ve posted the full version below.

Original Response to M Kary

Injury Prevention recently published a commentary critical of epidemiological approaches to cycling safety.[1] The author, M. Kary, suggests that our paper[2], which re-analysed Walker’s[3] study of motor vehicle overtaking distance for cyclists, made false claims about type I errors and confused statistical significance and clinical significance. Kary supports this critique, along with other points in the commentary, with a non-peer reviewed response posted by him to the journal’s website.[4]

In our paper, we note that increasing power when computing sample size leads to an increase in the probability of a type I error.[2] Kary[1] incorrectly repeats this contention as the probability of a type I error increasing with sample size, suggesting ours was a “false claim”. We will demonstrate that our original assertion regarding type I errors is correct and reinforce the points made in our reanalysis paper regarding statistical versus clinical significance. Both points are important – and often overlooked – issues in quantitative research.

Sample size when comparing two groups on a quantitative variable, such as comparing motor vehicle overtaking distance when wearing or not wearing helmet, is a function of effect size ($\delta$), the type I error rate ($\alpha$), and power ($1-\beta$). For example, using a formula for a two sample t-test (see appendix) the sample size to detect a small effect size[5] $\delta=0.2$ with 80% power and $\alpha=0.05$ is $n=786$ (or $393$ per group). Leaving power and sample size fixed in this example, Figure 1 shows the type I error rate as a function of power.

Relationship between power and type I error rate for comparing two groups (n=786 and δ=0.2)

As power increases, so does the type I error rate. When power increases to 98%, as in the Walker study, the type I error rate is 0.45. It is true, the type I error rate decreases as sample size increases (while leaving the effect size and power fixed), as suggested by Kary; however, this point is not the claim made in our paper.

It is possible to maintain a nominal type I error while increasing power; however, this comes at a large cost. As in Walker’s study, he chose a small effect size,[5] $\alpha=0.05$ and 98% power. The calculated sample size for these inputs is $n=1614$. However, when a more conventional 80% power is used, the sample size is $n=786$. In other words, for a fixed type I error, the sample size doubles to increase power from 80% to 98%.

Clinical significance relates to the practical implications of the absolute size of an effect. In the Walker study, Table 1 shows differences (in metres) between helmet wearing and no helmet wearing for various cut points of overtaking distance. For motor vehicles overtaking cyclists at less than 1.5 metres the difference is at most 7 millimetres. The largest absolute effect size is for overtaking at more than 2 metres and amounts to 7.2cm. There is no established appropriate effect size for overtaking distance and neither Walker[3] nor Kary[1] have suggested one. Being overtaken at 2 metres or more seems unlikely to represent a major safety issue, while being overtaken much closer e.g., at 0.75 or 0.753 metres may both be considered risky scenarios. In other words the clinical significance of the helmet effect seems negligible.

 Overtaking distance (m) Difference(no helmet vs. helmet) 95% CI (0, 0.75) -0.052 -0.224, 0.121 (0.75, 1.00) 0.003 -0.061, 0.067 (1.00, 1.50) 0.007 -0.012, 0.027 (1.50, 2.00) 0.017 -0.003, 0.037 (2.00, ∞) 0.072 0.034, 0.109

Table 1. Absolute effect sizes comparing overtaking distance between helmeted and unhelmeted conditions (adapted from Table 8 of Olivier and Walter[2]).

Statistical significance is a function of sample size so it is possible to achieve a statistically significant result that is clinically meaningless when sample size is large.[6] Standarized indices, such as Cohen’s [7] provide a sample size independent quantification of effect size.

Walker found a statistically significant relationship between helmet wearing and overtaking distance ($F_{1,2313}=8.71,p=0.003$). This result comes from an ANOVA model with helmet wearing, overtaking distance and their interaction. When the F-statistic is converted to effect size $\delta$ the estimated helmet wearing effect is $\delta=0.12$ which is trivial by Cohen’s definition. Additionally, Walker’s sample size was $n=2355$ which results in power of 99.8% to detect a small effect size for overtaking distance by helmet wearing status (assuming $\delta=0.2$ and $\alpha=0.05$).

As we aimed to demonstrate in our reanalysis of Walker’s data,[2] both the calculated effect size and the absolute effect size do not support helmet wearing as a major factor in overtaking distance between cyclists and motor vehicles. In a follow-up study, Walker and colleagues compared overtaking distances for seven types of cyclists with one type unhelmeted. Even when using a much larger sample size ($n=5690$), no statistically significant helmet effect was observed.[8]

Appendix

A formula for computing sample size per group for the two sample t-test is

$n=\dfrac{2\sigma^2(z_{1-\alpha/2}-z_{1-\beta})}{(\mu_1-\mu_2)^2}=\dfrac{2\sigma^2(z_{1-\alpha/2}-z_{1-\beta})}{\delta^2}$

where $\delta=(\mu_1-\mu_2)/2$ is the effect size and $z_p$ is the $p^{th}$ quantile of the standard normal distribution. In terms of the type I error rate, this equation is

$\alpha=2\left(1-\Phi^{-1}\left(\delta\sqrt{\dfrac{n}{2}}-z_{1-\beta}\right)\right)$

where $\Phi(x)$ is the cumulative standard normal distribution. An F statistic for a binary variable can be converted to $\delta$ using the following formula[9] which is further simplified for groups of equal sample size

$\delta=\sqrt{F\left(\dfrac{n_1+n_2}{n_1n_2}\right)\left(\dfrac{n_1+n_2}{n_1+n_2-2}\right)}\approx2\sqrt{\dfrac{F}{df_d}}$

Acknowledgement

This post was co-authored with Scott Walter.

References

1. Kary, M. Unsuitability of the epidemiological approach to bicycle transportation injuries and traffic engineering problems. Inj Prev in press.
2. Olivier, J, Walter, S. Bicycle Helmet Wearing Is Not Associated with Close Motor Vehicle Passing: A Re-Analysis of Walker, 2007. PLOS ONE 2013;e75424.
3. Walker, I. Drivers overtaking bicyclists: Objective data on the effects of riding position, helmet use, vehicle type and apparent gender. Accident Analysis & Prevention 2007;39:417–425.
4. Kary M. Fundamental misconceptions of safety and of statistics. Published 1 Dec 2013. PLoS ONE [eLetter] http://www.plosone.org/annotation/listThread.action?root=75587.
5. Cohen, J. A power primer. Psychological Bulletin 1992;112:155–159.
6. Sullivan, GM, Feinn, R. Using Effect Size—or Why the P Value Is Not Enough. Journal of Graduate Medical Education 2012; 4: 279-282.
7. Cohen, J. Statistical power analysis for the behavioral sciences, 1988. Lawrence Erlbaum Associates: Hillsdale, NJ, USA.
8. Walker, I, Garrard, I, Jowitt, F. The influence of a bicycle commuter’s appearance on drivers’ overtaking proximities: an on-road test of bicyclist stereotypes, high-visibility clothing and safety aids in the United Kingdom. Accident Analysis and Prevention 2014;64:69-77.
9. Thalheimer, W, Cook, S. How to calculate effect sizes from published research articles: A simplified methodology. Available at: http://work-learning.com/

# More Cherry-Picking from Cycle-Helmets.Com

Last month I posted a commentary regarding incorrect information on cycle-helmets.com. I attributed a problem in their analyses of cycling surveys in South Australia to a “transcribing problem”. However, the issues seem to be much more serious than that.

In the comment section of an article I authored on The Conversation, Dorothy Robinson stated

Australians generally consider cycling a healthy activity, so the discrepancy between the two sets of tables in the South Australian report might reflects a reluctance to admit they cycled less because of helmet laws. The “really big” boo-boos Linda talks about were caused by her looking at the wrong tables. Table numbers are now included in http://www.cycle-helmets.com/helmet-law-spin.html so that others will not make the mistake of attributing the differences between these tables to “transcribing errors”.

The website now refers the reader to Tables 5a and 5b (Destination of bicycle trips in the last 7 days). This isn’t completely correct as total responders were taken from Tables 1a and 1b (Frequency of bicycle riding).

The total cycling in the past week do not match up between Tables 1a/1b and 5a/5b. This is likely due to (near) complete responses for amount of cycling but missing responses for destinations. This is common when conducting surveys and highlights the problem with combining such tables, especially when there is no need to do so. In other words, if you’re really interested in comparing cycling rates before and after helmet legislation, why would you not use the frequency of cycling tables?

There is also the issue of throwing away usable data. Tables 1a and 1b contain information for four categories of cycling frequency (“At least once a week”, “At least once a month”, “At least once every 3 months”, “Less often or Never”). This information is mostly thrown out by combining the total responses for destinations in Tables 5a and 5b with the total cyclists in Tables 1a and 1b. Here is a summary of the proportions of cycling in South Australia across age groups and gender for years 1990 and 1993.

 Cycling in South Australia 1990 1993 At least weekly 21.8 21.0 At least monthly 5.2 6.0 At least every 3 months 3.9 4.4 Less often or never 69.1 68.6

These results suggest the SA helmet law had no impact on the amount of cycling. The suggestion by Robinson that responders are reluctant “to admit they cycled less because of helmet laws” is unsubstantiated. If someone is reticent to admit they don’t exercise, this would apply to both the 1990 and 1993 surveys.

I’d like to be wrong about this, but the analysis on this website reeks of fishing for results that support a pre-determined conclusion.

# Did Australian helmet law “kill” UK cycling to work?

There’s a recently published article in The Conversation about common misinterpretations of research. I strongly disagree with their take on helmet legislation and I have even stronger concerns they cite the Bicycle Helmet Research Foundation as a reliable source for information. I have communicated my concerns to the article’s authors privately.

There were lots of comments about helmet legislation — both critical and supportive. Here is one from Dorothy Robinson that I found very strange.

Adding in Scotland (which used to be included in the census data but now appears to have gone its own way), UK census data on cycling to work are:
1981: 3.76%
1991: 2.97%
2001: 2.89%
2011: 2.96%

Note that no citation was given for this data and I don’t know where it exists on the web. Some UK census data for cycling to work exists here.

For many years now, Robinson and the BHRF have used census data and counts from helmet use surveys to argue helmet legislation in Australia have significantly deterred people from cycling. In the UK, cycling to work decreased 21% from before any helmet legislation (1981) to after most Australians were subjected to such laws (1991). Note that during those same years, the Australian census data reported 1.11% and 1.13% travelled to work by bicycle in capital cities.

This certainly does not mean helmet legislation in Australia had anything to do with cycling rates in the UK (this post’s title is meant to be tongue-in-cheek). Cycling in Denmark has decreased 17% since 1990 (year of the first helmet law in Victoria) and no one believes this had anything to do with Australian helmet laws. However, I think such thought experiments highlight the problems in drawing strong conclusions from such analyses.

Census data is taken over a day and successive observations are five years apart (in the UK they are apparently 10 years apart). Treating this data as a time series ignores the day to day variability in the proportions of travel modes. There are lots of factors that influence whether someone cycles (including regular cyclists). Two observations, five or ten years apart doesn’t remotely account for that.

Yearly estimates of cycling participation/amount and broad categories about cycling frequency would be an improvement. An honest assessment of the quality of the available data and its limitations is sorely needed in this area. It seems there are some that are quite content with data as long as it supports their conclusions.

Update: Ben Goldacre has now published Linda Ward’s commentary. See comments below for more details.

The blog Bad Science is written by Ben Goldacre where he discusses problems he sees with science. To this end, he has also authored two books Bad Science and Bad Pharma. I am usually supportive of those who try to make sense and “correct” misconceptions about science to the general public.

For his effort, Goldacre should be applauded. However, he published a head-scratching commentary in the BMJ about bicycle helmets with David Spiegelhalter. Goldacre posted a short discussion and link to the commentary on his blog. I found their take on the topic to be misinformed and uncritical of anti-helmet advocates. This is a topic someone like Goldacre should take head on, but instead seems content to regurgitate anti-helmet rhetoric.

Linda Ward tried posting a comment to Bad Science detailing much of the evidence ignored by Goldacre and Spiegelhalter back in April 2014. Her comment was not published and I don’t know why.

Here is her comment in full.

Several population level helmet law studies have controlled for background trends and included both head and non-head injuries, and shown that the effect of the legislation on hospital admissions for cycling head injuries to be far from minimal:

– Carr/MUARC (1995), http://www.monash.edu.au/miri/research/reports/muarc076.html (Victoria, Australia)
– Hendrie (1999), http://www.ors.wa.gov.au/Documents/Cyclists/ors-cyclists-report-helmets-evaluation.aspx (Western Australia)

– Povey (1999), http://www.ncbi.nlm.nih.gov/pubmed/10487351 (New Zealand)

– Scuffham (2000), http://www.ncbi.nlm.nih.gov/pubmed/10487351 (New Zealand)

– Karkhaneh (2006), https://era.library.ualberta.ca/public/view/item/uuid:432ec921-cf50-4b91-8ab9-50f29baf074b (Alberta, Canada)

– Walter (2011), http://www.ncbi.nlm.nih.gov/pubmed/21819836 (New South Wales, Australia)

The head injury results in all these population-level longitudinal studies, and the AIS3/4 head/brain injury results in the Carr study, are consistent with the (hospital control) results of the Thompson Cochrane Review, and the Attewell and Elvik meta-analyses, of case-control studies.

Two factors are likely to be responsible the Dennis minimal effect finding: collinearity (between ‘time’ and ‘time since intervention’); and such a tiny number of pre-law data points for Ontario (30% of the 1994 injuries, law was Oct 95) and British Columbia (19% of the 1994 injuries, law was Sep 96).

Dennis et al. cite the Scuffham and Walter studies as being “limited by sample size or methodological quality”. However both the Scuffham and Walter analyses took baseline trends into account, and had (more than) adequate sample sizes. Macpherson claimed that the Povey and Scuffham analyses, and a preliminary (1992) MUARC study by Cameron, “failed to include a concurrent control group in the analysis”; however all 3 analyses used cyclist non-head injuries as concurrent control groups. (Povey’s and Scuffham’s analyses also included non-cyclist injuries.) Dennis also cites the preliminary 1992 Cameron/MUARC study; both Macpherson and Dennis have apparently overlooked the (1995) Carr/MUARC study (4 years of post-law data), which superceded the (1992) Cameron study (1 year of post-law data).

This (2013) paper debunks the Fyhri and Walker risk compensation, and Robinson safety in numbers, claims: http://acrs.org.au/wp-content/uploads/26_Olivier_PR.pdf (also see https://injurystats.wordpress.com/author/jakeolivier/). With respect to the 85/88% in the “Seattle” study, Guy Chapman states that “nothing approaching this has ever been observed in a real cyclist population and the case and control groups were completely different”. By “real cyclist population” and “completely different” “case and control groups”, it seems that Guy may mean population-level longitudinal studies, and hospital vs population controls. I am not aware of any studies using population controls, it would be helpful if Guy were to cite the studies he is talking about (and a reference for his claim, on a Wikipedia talk page last year, that “50% of cyclist deaths in London are due to crushing by goods vehicles at junctions, cause of death being abdominal trauma”).

Guy states that “substituting co-author Rivara’s own street count data in the 1989 study, instead of their assumed value, makes the effect vanish into the statistical noise”, but does not provide an references. Struggling to understand how one could (validly) “substitute” “Rivara’s own street count data” into a case-control study (and finding no helmet studies in with Rivara as 1st author in PubMed), I forced myself to have a look at the (truly dreadful) cyclehelmets site. Guy’s claim that substituting “Rivara’s own” data . . . makes the effect vanish into the statistical noise” seems to be referring to the http://www.cyclehelmets.org/1068.html claim that “Of 4,501 child cyclists observed cycling around Seattle, just 3.2% wore helmets. This is not statistically different from the 2.1% of the hospital cases who were wearing helmets”. The required sample size, to detect a difference (with 80% power) between 2.1% and 3.2%, is 3,346 in EACH group; the cyclehelmets site states that there were 135 cases. The effect does not “vanish into the statistical noise”, it is (statistical) rubbish to claim, on the basis of such grossly inadequate sample size (less than 1/20th of the numbers cases required for such a comparison), that the lack of a statistically significant effect is (real) evidence that there is no effect.

I am still wondering what Guy means by “assumed value”, it would be helpful if Guy could explain how the the case-control study “assumed” helmet wearing prevalence.

It is the BHRF site (cyclehelmets) site, not the Cochrane review, that is disgraceful: the site also misrepresents the results of the Carr, Hendrie, Povey, Scuffham, Karkhaneh, Walter, Attewell, and Elvik studies; it also misrepresents the results of the Australian (Victorian, New South Wales, and South Australian) participation surveys (see the above Olivier/ACRS link).

My current ‘favourite’ example is the claim (http://www.cyclehelmets.org/1146.html) that “Helmeted cyclists have about the same percentage of head injuries (27.4%) as unhelmeted car occupants and pedestrians (28.5%). Wearing a helmet seems to have no discernible impact on the risk of head injury.”. The reference cited is “Serious injury due to land transport accidents, Australia, 2003-04”. As a BHRF “editorial board” member, maybe Guy can explain how it is possible to draw such a conclusion from a report that does contain any information as to what the head injury rates were prior to the helmet legislation?

(The BHRF: a perfect teaching case for how NOT to ‘do’ epidemiology?)

As a demonstration Linda actually submitted her comments, here are screenshots.

# Transcribing Problems with cycle-helmets.com Analysis

I recently discussed problems replicating the results found in an assessment of mandatory helmet legislation in Australia published in Accident Analysis and Prevention (Robinson, 1996). This issue was introduced to me by Linda Ward who has pointed to a related issue.

The anti-helmet website http://www.cycle-helmets.com has a page titled “Spinning” helmet law statistics. Under the heading Measuring changes in cycle use, the webpage states

Similarly, in South Australia a telephone survey found no significant decline in the amount people said they cycled but there was a large, significant drop in how much they had actually cycled in the past week 24. In 1990 (pre-law), 17.5% of males aged at least 15 years reported cycling in the past week (210 out of 1201), compared to 13.2% (165 out of 1236) post-law in 1993. For females, 8.1% (102 out of 1357) had cycled in the past week in 1990 compared to 5.9% (98 out 1768) in 1993 24.

These reductions (24% for males, 26% for females aged at least 15 years) are statistically significant (P < 0.005 for males, P = 0.025 for females).

The citation given is a technical report that evaluated the introduction of helmet legislation in South Australia.[1] Table 1 of the SA report gives frequencies of bicycle riding from two surveys, one in 1990 and the other in 1993, for those aged 15 years or older separated by gender. In this survey, the amount of cycling split into four categories: “At Least Once A Week”, “At Least Once A Month”, “At Least Once Every 3 Months” and “Less Often or Never”. The SA helmet law went into effect on 1 July 1991.

The main problem here is the numbers in the above quote don’t match up to the data in the original report. Here is a screenshot of the table.

When these numbers are corrected and a comparison is made for those cycling at least once a week versus everyone else, the p-values are 0.279 and 0.450 for males and females respectively. Additionally, the relative risks are 0.90 (95% CI: 0.76,1.08) and 0.91 (95% CI: 0.71, 1.17) for males and females respectively. The point estimates for changes in the proportion cycling in the past week are much less than those reported on the webpage.

In addition to using the wrong data, I don’t agree with the analysis. There are four cycling categories which have been collapsed into two — those who cycle at least once a week and those who don’t. A lot of information is needlessly removed from the data. Instead, a chi-square test for independence could’ve been performed and individual changes could be assessed through an investigation of the residuals.

The Pearson residuals for an individual cell from a chi-square test are

$r=\dfrac{O-E}{\sqrt{E}}$

where $O$ are the observed frequencies and $E$ is the expected frequency under an assumption of independence, i.e., no relationship between helmet legislation and the amount of cycling. These residuals are asymptotically normal, so residuals with absolute value greater 1.96 may be considered “statistically significant”. The sign would indicate observing more than expected (if positive) or less than expected (if negative).

When analyses are performed on the full tables, the chi-square tests give p-values of 0.20 and 0.85 for males and females respectively. None of the residuals have absolute value anywhere near 1.96. The largest residual pair is for males cycling “at least once every 3 months”. The signs of the residuals indicate there is less cycling than expected in 1990 (r=-1.04) and more cycling than expected in 1993 (r=1.02) if there is no relationship between helmet legislation and amount of cycling. Here is some R code to do those analyses.

males=matrix(c(204,190,66,83,58,77,871,886),nrow=2)
males

females=matrix(c(104,123,59,74,52,64,1141,1507),nrow=2)
females

chisq.test(males,correct=F)
chisq.test(females,correct=F)

chisq.test(males,correct=F)$residuals chisq.test(females,correct=F)$residuals

The analyses above are stratified by gender and we could perform a unified analysis using Poisson regression. This model is essentially

$log(\mu)=\beta_0+\beta_1YEAR+\beta_2FREQ+\beta_3GENDER+\beta_4YEAR*FREQ+\beta_5YEAR*GENDER+\beta_6FREQ*GENDER+\beta_7YEAR*FREQ*GENDER$

I’ve simplified things a bit here because the variable $FREQ$ has four categories and therefore gets estimated by three dummy variables.

The important comparison here is the interaction between $YEAR$ and $FREQ$. If significant, this would indicate helmet legislation and amount of cycling are associated. Using the given South Australian data, the three-way interaction was non-signficant, so was removed from the model. The p-value of the interaction between $YEAR$ and $FREQ$ is not statistically significant (p=0.41).

No analysis I’ve performed indicates a significant relationship between helmet legislation and amount of cycling in South Australia among those 15 years or older when using the correct data.

Note: The anti-helmet website http://www.cycle-helmets.com is maintained by Chris Gillham. I previously discussed problems with this website here. If you download the PDF version of this report, the author is listed as “Dorre” who I believe is Dorothy Robinson. Both Gillham and Robinson are editorial board members of the anti-helmet organisation Bicycle Helmet Research Foundation.

1. Marshall, J. and M. White, Evaluation of the compulsory helmet wearing legislation for bicyclists in South Australia Report 8/94, 1994, South Australian Department of Transport, Walkerville, South Australia.

# Something Amiss in Robinson (1996)

A 1996 article titled “Head Injuries and Bicycle Helmet Laws” published in Accident Analysis and Prevention is one of the most highly cited papers assessing the effect of helmet legislation.[1] (148 citations, Google Scholar, 4 Sept 2014) Additionally, this seems to be the first article purportedly demonstrating a negative impact of such laws. The conclusions of this paper state

Consequently, a helmet law, whose most notable effect was to reduce cycling, may have generated a net loss of health benefits to the nation.

In this paper, secondary analyses were performed on data contained in other reports. I’ve pointed out in a previous paper[2] that NSW adult cycling counts exist from sources cited in this paper although they are not presented. This is curious because the counts of adult cyclists from NSW helmet use surveys increased from pre- to post-helmet legislation which contradicts the conclusions of this paper. Adult cycling also increased by 44% in Victoria following helmet legislation.[3]

Linda Ward has pointed to another issue with this paper regarding a comparison of the proportion of head injury hospitalizations to cyclists before and after legislation in Victoria. Some of the relevant data is given in Table 6.[1] In this table, the proportion of head injuries are 31.4% for 1989/90 and 27.3% for 1990/91 for hospital admissions in Victoria. During this period, there are a total of n=2300 cycling hospitalizations. The author notes a comparison of these proportions is non-significant by a chi-square test.

The 2×2 table for this data can be reproduced using the source material.[4] Figure 25 of this report gives “All Other Injuries” of about 900 for year 1989/90. This allows us to fill in the rest of the table given below.

 Year Other Injury Head Injury 1989/90 900 412 1990/91 718 270

The frequencies of the other cells seem to correspond to the other values in Figure 25. The chi-square test for this table results in $\chi^2=4.49$, $p=0.03$ and $OR=0.82$. This result could be influenced by the need to estimate the number of cases from a plot. We can assess the influence of this estimate by repeating the analysis for other values near 900. Choosing values from 890 to 910 results in the plot of p-values below.

As you can see, there is a statistically significant decline in head injury in each instance for cycling injury in Victoria before and after helmet legislation. R code to reproduce these results is given below.

n=2300
p1=0.314
p2=0.273

a=900
n1=round(900/(1-p1))
b=n1-900
n2=n-n1
d=round(n2*p2)
c=n2-d

tab=matrix(c(a,b,c,d),nrow=2,byrow=T)
rownames(tab)=c(‘1989/90′,’1990/91’)