New Zealand Helmet Law and Validity of a Regression Model

After my recent post regarding cycling fatalities in New Zealand, someone pointed me to a Wikipedia discussion regarding a peer-reviewed paper I co-authored that discussed methodological issues of papers assessing the NZ helmet law that became effective 1 January 1994.

There are criticisms about our paper from Dorothy Robinson, Richard Keatinge and Nigel Perry (all editorial board members of the anti-helmet organization Bicycle Helmet Research Foundation) regarding our criticisms of a paper by Robinson (2001) which in turn was a criticism of a paper by Povey et al. (1999). In both papers, the ratio of head injuries and limb fractures were modelled over the period 1990-1996. In their paper, Povey et al. found changes in helmet wearing were negatively associated with a decline in the log of the head/limb injury ratio for three age groups in non-motor vehicle accidents and all ages in motor vehicle accidents.

Robinson criticized Povey and colleagues for “failure to fit time trends in their model” and that the observed benefit was an “artefact”. Her analysis focused solely on adults in non-motor vehicle accidents and ignored the data for children and motor-vehicle accidents (which are often the most severe). This is curious considering the NZ helmet law applies to on-road cycling and, therefore, the cyclist interactions with motor vehicles is the more relevant here.

In our paper, we noted that although Povey et al did not appear to check the assumptions of their model, inspection of the residuals suggests their model was valid. On Wikipedia, Robinson (under the pseudonym Dorre) reiterates her earlier criticism stating “Povey did not take time trends into account” and suggests this as the reason for finding a helmet benefit. She then states “most people would expect a claim that the model is “valid” to imply there is evidence of causation!” It is unclear to me why Robinson, who claims to be a statistician, would make such a statement (and other such statements in her paper and Wikipedia). 

Let me explain. The actual model fit by Povey et al. (1998) is

log(HEAD_{i}/LIMB_{i})=\alpha + \delta(HELMET_{i}) + \epsilon_{i}

where \epsilon_{i} \hbox{ for } i=1,\dots,n are assumed to be independent, normally distributed random variables with mean 0 and constant variance \sigma^2. This is usually stated as

\epsilon_i \overset{iid}{\sim} N(0,\sigma)

A linear regression model is valid if the above assumptions imposed on the \epsilon_i‘s are reasonable. There is no assumption of fitting time trends, as Robinson suggests, to any linear or generalized linear model. It is assumed the errors are serially independent, but that is not equivalent to fitting time trends. Additionally, a valid linear model does not imply a causal relationship between the independent and dependent variables — this would also hold for Robinson’s contention which is essentially that time caused the decline.

The assumptions related to a linear model can be checked using the residuals. The residuals are the observed differences in the observed and fitted observations which is mathematically written as

e_i=log(HEAD_{i}/LIMB_{i})-\left(\hat{\alpha}+\hat{\delta}(HELMET_{i})\right)

where \hat{\alpha} and \hat{\delta} are intercept and slope estimates using the method of least squares.

Using the observed residuals, the normal assumption can be assessed using a normal quantile plot, the linearity and constant variance assumptions can be checked by a scatterplot of the residuals, and serial independence checked using the Durbin-Watson statistic or graphically using the autocorrelation function.

There is nothing in the residual plots that suggest the model used by Povey et al is not valid.

PoveyPlots

Below is the R code to construct these plots, and to perform the Durbin-Watson test.

ratio<-c(1.40,1.09,1.07,0.94,0.86,0.83,0.77)
helmet<-c(30,36,41,43,92,93,87)
reg<-lm(log(ratio)~helmet)
par(mfrow=c(1,3))
qqnorm(reg$res); qqline(reg$res)
plot(reg$res~helmet,ylab=’Residuals’,main=’Residual Plot’)
acf(reg$res,main=’Autocorrelation Function’)
library(lmtest); dwtest(reg)

Another concept that seems lost in the criticism is that Povey and colleagues were testing an a priori hypothesis. As such, their model was hypothesis driven and pre-determined without letting the data influence modelling decisions. This is an important consideration if the researcher is to avoid spurious correlations.

It is a shame what has happened to the Wikipedia pages on bicycle helmets. Many of the contributors have clear conflicts of interest, like Dorothy Robinson, Richard Keatinge and Nigel Perry (all editorial board members of the anti-helmet organization Bicycle Helmet Research Foundation), who routinely offer a biased view of the available research. I do plan on discussing the negative influence this group and website have had on our understanding about cycling safety at some point.

The comments Robinson, Keatinge and Perry have made to my paper with Joanna Wang is a prime example of their negative influence. Although their discussion began on Wikipedia, it would appear these critics are unaware that Wikipedia has an actual page discussing the validity of a regression model. The first sentence states:

In statistics, regression model validation is the process of deciding whether the numerical results quantifying hypothesized relationships between variables, obtained from regression analysis, are in fact acceptable as descriptions of the data.

As discussed, model validity has nothing to do with any of their criticisms and, importantly, the assumptions made by Povey et al (1999) in their analysis appear to be valid.

Advertisements