Plotting Data That Doesn’t Exist is a Bad Idea

I came across a blog post recently that gives someone’s take on bicycle helmet legislation in New Zealand (links here and here). The post includes the plot below created by Chris Gillham on his anti-helmet advocacy website (the NZ analysis is here). Gillham is also an editorial board member of Bicycle Helmet Research Foundation.


The figure shows the number of cyclists in New Zealand declined around the helmet law in 1994 and was roughly flat thereafter. Gillham’s estimate of injuries per 100,000 cyclists has increased during that time.

This figure (and others) is discussed on the Wikipedia page for Bicycle Helmets in New Zealand. It states

Australian journalist Chris Gillham [19] compiled an analysis of data from Otago University and the Ministry of Transport, showing a marked decline in cycling participation immediately following the helmet law introduction in 1994. At the same time as the number of cyclists aged over 5 years approximately halved, the injury rate approximately doubled. Noting both the decline in numbers and increase in injury rate preceded the law’s introduction at the start of 1994, possibly attributable to the fact that heavy promotion of helmets had been ongoing in the lead-up to the law’s introduction. This phenomenon of just helmet promotion leading to a reduction in cycling has been witnessed in several countries.[20] See Figure 2.

The problem here is that much of the cycling participation data shown simply does not exist.

The link takes you to Reports and Fact Sheets from the NZ Household Travel Survey. In the Cycling Fact Sheet, Tables 5-7 includes historical data from past surveys taken during the years 1989/90, 1997/98 and 2003-2012. So, there is no data prior to 1989, between 1991-1996, and between 1999-2002. Importantly, there is no cycling participation data in a 6-year window around the NZ helmet law date of 1 January 1994. Here’s a plot that accurately represents the NZ cycling participation survey data.


Although there are only two surveys prior to 2003, Gillham’s plot contains data for each year. Where did this new data come from? Gillham states

This pop-up table is based on the Ministry of Transport surveys of 5yo+ cycling participation as a percentage of population, as displayed above, with trends smoothed to compensate for the irregularity of the survey timeline.

I find this result quite curious. There are only two surveys, and therefore only two points, before 2003. Fitting a linear model requires the estimation of an intercept, a slope and additional parameters for any changes in the initial linear pattern. Starting with 1988, I count two changes in the trend up to 1997 (could be three, but the 1991 data is hard to see). Therefore, four parameters would need to be estimated to reproduce this “smooth” plot.

To estimate the parameters in a linear model, the number of data points must be at least the number of parameters. Since there are only two data points, only two parameters can be estimated, i.e., a slope and an intercept. Therefore, the “trend” given in Gillham’s plot is impossible given the data available. Note the red line graph relies on cycling participation data, so it is also incorrect.

There is also a problem with computing injuries per cyclist based on these surveys. The Household Travel Surveys capture cycling for transport, so it would not represent any changes in off-road cycling like mountain biking. It might be reasonable to compute injuries per cyclist involved in a motor vehicle collision, but injuries from non-motor vehicle collisions would be tenuous at best.

This issues are especially troubling because Gillham’s analysis forms part of the research knowledge base on Wikipedia where, I believe, most laypeople and media types get their information. Note that many of the contributors to the Wikipedia page are editorial board members of the anti-helmet organisation Bicycle Helmet Research Foundation. There have been discussions to limit their negative and undo influence, but nothing has come of it.

So, why was there a drop in cycling between the first and second surveys?

It’s really hard to tell given there are only two time points and there is no way to assess changes relative to helmet legislation with only one pre-helmet law time point. It could very well be the decline in cycling participation started long before the helmet law. One NZ research article notes cycling participation in NZ declined steadily from 1986 onwards.

The website for the 1997/98 Travel Survey actually discusses the changes in on-road cycling since the 1989/90 survey.

Between 1989/90 and 1997/98, on-road cycling has decreased by 19 percent, with the largest decrease among school-age children and teenagers. Other countries have also seen large reductions in cycling (for example, cycling in Great Britain has fallen by 20 percent over the same period*). Once an almost universal mode of transport for school children, concern about safety has seen cycling to school become less popular. However, there has been an increase in cycling, particularly longer trips, among the 20-24 age group.

* Source: “Road Accidents Great Britain 1998 The Casualty Report” (September 1999), Department of the Environment, Transport and the Regions, United Kingdom.

Since Great Britain does not have any form of helmet legislation and cycling participation declined at a rate similar to NZ, the evidence does not support the hypothesis the NZ helmet law deterred cycling rates.


1 thought on “Plotting Data That Doesn’t Exist is a Bad Idea

  1. Pingback: Transcribing Problems with Analysis | Injury Stats

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s