# Plotting Data That Doesn’t Exist is a Bad Idea

I came across a blog post recently that gives someone’s take on bicycle helmet legislation in New Zealand (links here and here). The post includes the plot below created by Chris Gillham on his anti-helmet advocacy website http://www.cycle-helmets.com (the NZ analysis is here). Gillham is also an editorial board member of Bicycle Helmet Research Foundation.

The figure shows the number of cyclists in New Zealand declined around the helmet law in 1994 and was roughly flat thereafter. Gillham’s estimate of injuries per 100,000 cyclists has increased during that time.

This figure (and others) is discussed on the Wikipedia page for Bicycle Helmets in New Zealand. It states

Australian journalist Chris Gillham [19] compiled an analysis of data from Otago University and the Ministry of Transport, showing a marked decline in cycling participation immediately following the helmet law introduction in 1994. At the same time as the number of cyclists aged over 5 years approximately halved, the injury rate approximately doubled. Noting both the decline in numbers and increase in injury rate preceded the law’s introduction at the start of 1994, possibly attributable to the fact that heavy promotion of helmets had been ongoing in the lead-up to the law’s introduction. This phenomenon of just helmet promotion leading to a reduction in cycling has been witnessed in several countries.[20] See Figure 2.

The problem here is that much of the cycling participation data shown simply does not exist.

The link takes you to Reports and Fact Sheets from the NZ Household Travel Survey. In the Cycling Fact Sheet, Tables 5-7 includes historical data from past surveys taken during the years 1989/90, 1997/98 and 2003-2012. So, there is no data prior to 1989, between 1991-1996, and between 1999-2002. Importantly, there is no cycling participation data in a 6-year window around the NZ helmet law date of 1 January 1994. Here’s a plot that accurately represents the NZ cycling participation survey data.

Although there are only two surveys prior to 2003, Gillham’s plot contains data for each year. Where did this new data come from? Gillham states

This pop-up table is based on the Ministry of Transport surveys of 5yo+ cycling participation as a percentage of population, as displayed above, with trends smoothed to compensate for the irregularity of the survey timeline.

I find this result quite curious. There are only two surveys, and therefore only two points, before 2003. Fitting a linear model requires the estimation of an intercept, a slope and additional parameters for any changes in the initial linear pattern. Starting with 1988, I count two changes in the trend up to 1997 (could be three, but the 1991 data is hard to see). Therefore, four parameters would need to be estimated to reproduce this “smooth” plot.

To estimate the parameters in a linear model, the number of data points must be at least the number of parameters. Since there are only two data points, only two parameters can be estimated, i.e., a slope and an intercept. Therefore, the “trend” given in Gillham’s plot is impossible given the data available. Note the red line graph relies on cycling participation data, so it is also incorrect.

There is also a problem with computing injuries per cyclist based on these surveys. The Household Travel Surveys capture cycling for transport, so it would not represent any changes in off-road cycling like mountain biking. It might be reasonable to compute injuries per cyclist involved in a motor vehicle collision, but injuries from non-motor vehicle collisions would be tenuous at best.

This issues are especially troubling because Gillham’s analysis forms part of the research knowledge base on Wikipedia where, I believe, most laypeople and media types get their information. Note that many of the contributors to the Wikipedia page are editorial board members of the anti-helmet organisation Bicycle Helmet Research Foundation. There have been discussions to limit their negative and undo influence, but nothing has come of it.

So, why was there a drop in cycling between the first and second surveys?

It’s really hard to tell given there are only two time points and there is no way to assess changes relative to helmet legislation with only one pre-helmet law time point. It could very well be the decline in cycling participation started long before the helmet law. One NZ research article notes cycling participation in NZ declined steadily from 1986 onwards.

The website for the 1997/98 Travel Survey actually discusses the changes in on-road cycling since the 1989/90 survey.

Between 1989/90 and 1997/98, on-road cycling has decreased by 19 percent, with the largest decrease among school-age children and teenagers. Other countries have also seen large reductions in cycling (for example, cycling in Great Britain has fallen by 20 percent over the same period*). Once an almost universal mode of transport for school children, concern about safety has seen cycling to school become less popular. However, there has been an increase in cycling, particularly longer trips, among the 20-24 age group.

* Source: “Road Accidents Great Britain 1998 The Casualty Report” (September 1999), Department of the Environment, Transport and the Regions, United Kingdom.

Since Great Britain does not have any form of helmet legislation and cycling participation declined at a rate similar to NZ, the evidence does not support the hypothesis the NZ helmet law deterred cycling rates.

# New Zealand Helmet Law and Validity of a Regression Model

After my recent post regarding cycling fatalities in New Zealand, someone pointed me to a Wikipedia discussion regarding a peer-reviewed paper I co-authored that discussed methodological issues of papers assessing the NZ helmet law that became effective 1 January 1994.

There are criticisms about our paper from Dorothy Robinson, Richard Keatinge and Nigel Perry (all editorial board members of the anti-helmet organization Bicycle Helmet Research Foundation) regarding our criticisms of a paper by Robinson (2001) which in turn was a criticism of a paper by Povey et al. (1999). In both papers, the ratio of head injuries and limb fractures were modelled over the period 1990-1996. In their paper, Povey et al. found changes in helmet wearing were negatively associated with a decline in the log of the head/limb injury ratio for three age groups in non-motor vehicle accidents and all ages in motor vehicle accidents.

Robinson criticized Povey and colleagues for “failure to fit time trends in their model” and that the observed benefit was an “artefact”. Her analysis focused solely on adults in non-motor vehicle accidents and ignored the data for children and motor-vehicle accidents (which are often the most severe). This is curious considering the NZ helmet law applies to on-road cycling and, therefore, the cyclist interactions with motor vehicles is the more relevant here.

In our paper, we noted that although Povey et al did not appear to check the assumptions of their model, inspection of the residuals suggests their model was valid. On Wikipedia, Robinson (under the pseudonym Dorre) reiterates her earlier criticism stating “Povey did not take time trends into account” and suggests this as the reason for finding a helmet benefit. She then states “most people would expect a claim that the model is “valid” to imply there is evidence of causation!” It is unclear to me why Robinson, who claims to be a statistician, would make such a statement (and other such statements in her paper and Wikipedia).

Let me explain. The actual model fit by Povey et al. (1998) is

$log(HEAD_{i}/LIMB_{i})=\alpha + \delta(HELMET_{i}) + \epsilon_{i}$

where $\epsilon_{i} \hbox{ for } i=1,\dots,n$ are assumed to be independent, normally distributed random variables with mean 0 and constant variance $\sigma^2$. This is usually stated as

$\epsilon_i \overset{iid}{\sim} N(0,\sigma)$

A linear regression model is valid if the above assumptions imposed on the $\epsilon_i$‘s are reasonable. There is no assumption of fitting time trends, as Robinson suggests, to any linear or generalized linear model. It is assumed the errors are serially independent, but that is not equivalent to fitting time trends. Additionally, a valid linear model does not imply a causal relationship between the independent and dependent variables — this would also hold for Robinson’s contention which is essentially that time caused the decline.

The assumptions related to a linear model can be checked using the residuals. The residuals are the observed differences in the observed and fitted observations which is mathematically written as

$e_i=log(HEAD_{i}/LIMB_{i})-\left(\hat{\alpha}+\hat{\delta}(HELMET_{i})\right)$

where $\hat{\alpha}$ and $\hat{\delta}$ are intercept and slope estimates using the method of least squares.

Using the observed residuals, the normal assumption can be assessed using a normal quantile plot, the linearity and constant variance assumptions can be checked by a scatterplot of the residuals, and serial independence checked using the Durbin-Watson statistic or graphically using the autocorrelation function.

There is nothing in the residual plots that suggest the model used by Povey et al is not valid.

Below is the $R$ code to construct these plots, and to perform the Durbin-Watson test.

ratio<-c(1.40,1.09,1.07,0.94,0.86,0.83,0.77)
helmet<-c(30,36,41,43,92,93,87)
reg<-lm(log(ratio)~helmet)
par(mfrow=c(1,3))
qqnorm(reg$res); qqline(reg$res)
plot(reg$res~helmet,ylab=’Residuals’,main=’Residual Plot’) acf(reg$res,main=’Autocorrelation Function’)
library(lmtest); dwtest(reg)

Another concept that seems lost in the criticism is that Povey and colleagues were testing an a priori hypothesis. As such, their model was hypothesis driven and pre-determined without letting the data influence modelling decisions. This is an important consideration if the researcher is to avoid spurious correlations.

It is a shame what has happened to the Wikipedia pages on bicycle helmets. Many of the contributors have clear conflicts of interest, like Dorothy Robinson, Richard Keatinge and Nigel Perry (all editorial board members of the anti-helmet organization Bicycle Helmet Research Foundation), who routinely offer a biased view of the available research. I do plan on discussing the negative influence this group and website have had on our understanding about cycling safety at some point.

The comments Robinson, Keatinge and Perry have made to my paper with Joanna Wang is a prime example of their negative influence. Although their discussion began on Wikipedia, it would appear these critics are unaware that Wikipedia has an actual page discussing the validity of a regression model. The first sentence states:

In statistics, regression model validation is the process of deciding whether the numerical results quantifying hypothesized relationships between variables, obtained from regression analysis, are in fact acceptable as descriptions of the data.

As discussed, model validity has nothing to do with any of their criticisms and, importantly, the assumptions made by Povey et al (1999) in their analysis appear to be valid.

# New Zealand Cycling Fatalities and Bicycle Helmets

A colleague sent me an assessment of cycling fatalities in New Zealand. The report’s author is Dr Glen Koorey of the University of Canterbury. He’ll be one of the keynote speakers at the upcoming Velo-City Conference in Adelaide. In particular, I was tasked to comment about his section regarding bicycle helmets as they, in part, now form the basis of the Wikipedia page on Bicycle Helmets in New Zealand.

In the report, Koorey states

Only nine victims were noted as not wearing a helmet, similar to current national helmet-wearing rates (92%). This highlights the fact that helmets are generally no protection to the serious forces involved in a major vehicle crash; they are only designed for falls. In fact, in only one case did the Police speculate that a helmet may have saved the victim’s life. There is a suspicion that some people (children in particular) have been “oversold” on the safety of their helmet and have been less cautious in their riding style as a result.

On the surface, he has a point based on independence for probabilities. In mathematical terms, Koorey is stating

$P(helmet | fatality) \approx P(helmet)$

which is, by definition, independence (if they are equal). So, if the helmet wearing proportion among fatalities is equal to that in population, then helmet wearing is independent of fatality.

As I see it, the problem is in the interpretation as it is not a pure measure of helmet effectiveness. Helmets are a directed safety intervention, so they won’t protect body parts other than the head and you can certainly die from other injuries. It could very well be that helmet wearing is independent of fatalities, but the the sheer force of the collision makes other serious (and possibly fatal) injuries more likely negating any benefit to helmet wearing.

I searched through the publicly available data (found here) and asked around about what’s available in the complete data. In the end, there’s not enough information to identify location or severity of injuries. If we had all the data, a more appropriate probability to investigate would be

$P(helmet | \hbox{fatality due to head injury}) = P(helmet)$

When looking at the reported data, however, Koorey’s claim the proportion of fatalities wearing a helmet is “similar to current national helmet‐wearing rates (92%)” doesn’t appear justified.

First, he states there were 84 cycling fatalities between 2006-2012 in New Zealand. Of these, about 10% did not have information about helmet wearing. So, there is information on 76 fatalities and 9 of those were not wearing helmets. This gives us the proportion of non-helmet wearers among fatalities of 11.84% (9/76). This is not an estimate since this figure comes from all cycling fatalities in New Zealand.

Koorey wants to compare this to estimates of helmet wearing in New Zealand. Over this time frame, I compute a yearly average helmet wearing rate of 92.57%. So, the proportion of cyclists not wearing helmets is 7.43% during that time. This data could then be summarized by a $2 \times 2$ table as

 Helmet Yes No Death Yes a b No c d

From the data available, we do know $a=67$, $b=9$, $\frac{c}{c+d}=0.9257$ and $\frac{d}{c+d}=0.0743$. We would like to compute the risk of death for those wearing helmets versus those that do not; however, this is not possible using this summary data as we don’t really know how many cyclists there are.

Instead, we can compute the odds ratio (OR) which is a good estimate of relative risk for rare events (cycling deaths are certainly rare). The odds ratio is

$OR=\dfrac{ad}{bc}=\dfrac{a\frac{d}{c+d}}{b\frac{c}{c+d}}=0.598$

If helmet wearing were identical among fatalities and the general population, as Koorey has suggested, the odds ratio would be 1. Instead of being similar, the risk of death is 40% less among helmeted NZ cyclists versus those without a helmet. This figure is consistent with the latest re-re-analysis of a meta-analysis from case-control studies, although this is likely a conservative figure since head (or any other) injuries were not identified.

Statistical significance would be hard to come by here considering we don’t have the exact counts of cyclists from those surveys (or from the general population). However, the asymptotic variance of the log(OR) is

$\widehat{var}(log(OR)) \approx 1/a + 1/b + 1/c + 1/d$

The last available helmet use survey came from over 4600 cyclists (that is 7*4600 over the study period). Since this is such a sizable number, the last two terms of the variance formula do not contribute much.

Using only the fatalities in the variance formula gives us an asymptotic confidence interval for the odds ratio of

$OR\times e^{\pm 1.96 \times s.e.} = (0.298, 1.198)$

where the $s.e. = \sqrt{1/a + 1/b}$ (this assumes both $1/c$ and $1/d$ are small). Note this result is not statistically significant; however, this is due to having relatively few cycling fatalities (which is good and having less would be better).

There’s also the issue regarding the effect of missing data. One method is to recompute the odds ratio assuming all missings did not wear helmets and repeat assuming all missings did wear helmets giving a range of possible values. The odds ratios are 0.316 and 0.669 respectively. So, at worst, there is an estimated 33% decrease in the risk of death when wearing a helmet versus not.

Koorey’s claims are therefore not justified as the risk of death was much less among helmeted cyclists.This is even without specific information about cause of death and properly assessing helmet effectiveness to lower the risk of a fatality.

I also take issue with Koorey’s statement “This highlights the fact that helmets are generally no protection to the serious forces involved in a major vehicle crash; they are only designed for falls.” A recently published article in Accident Analysis and Prevention states

Considering a realistic bicycle accident scenario documented in the literature (Fahlstedt et al., 2012) where a cyclist was thrown at 20 km/h (i.e. 5.6 m/s which corresponds to a drop height of approximately 1.5 m), our analysis indicates that a helmeted cyclist in this situation would have a 9% chance of sustaining the severe brain and skull injuries noted above whereas an unhelmeted cyclist would have sustained these injuries with 99.9% certainty. In other words, a helmet would have reduced the probability of skull fracture or life threatening brain injury from very likely to highly unlikely.

I also published a paper last year where we found helmets reduced the odds of severe head injury by up to 74% (these were NSW cyclists hospitalised after a motor vehicle crash and reported to the police from 2001-2009). Severe injuries included “Open wound of head with intracranial injury” (S01.83), “Multiple fractures involving skull and facial bones” (S02.7), “Fracture of skull and facial bones, part unspecified” (S02.9), “Loss of consciousness [30 mins-24hrs]” (S06.03), “Loss of consciousness prolonged without return of consciousness ” (S06.05), “Traumatic cerebral oedema” (S06.1), “Diffuse brain injury” (S06.2), “Other diffuse cerebral & cerebellar injury” (S06.28), “Traumatic subdural haemorrhage” (S06.5), “Traumatic subarachnoid haemorrhage” (S06.6), “Other intracranial injuries” (S06.8), and “Intracranial injury, unspecified” (S06.9). None of these are minor injuries.

Using available data, the evidence does suggest helmet wearing mitigates cycling fatalities and serious injury. It does not appear as though the public have been oversold on the benefits of bicycle helmets.

Update: The original version focused on the relative risk of helmet wearing among fatalities and helmet wearing surveys in New Zealand. This made the wording quite strange and difficult to interpret. However, the odds ratio isn’t as problematic and is a good estimate of relative risk of death in this instance.