Below is a letter to the editor I recently wrote in response to the potential flaws in the analysis that forms the basis of the Waterloo Region Record's recent article "Police call records reveal region's trouble hot spots" which can be read here ->http://goo.gl/DDQEg0
One of the first things
that I emphasize to the students in my biostatistics class at Wilfrid
Laurier University is that statistics are a powerful tool. Used
carefully and properly, statistics can provide valuable
insight into the factors that shape the world around us - but used or
interpreted incorrectly, statistics can potentially lead to conclusions
that are unjustified or altogether incorrect. Your recent "analysis" of
police call data seems to fall into the latter
category due to problems with your data set, and in the conclusions
drawn from them.
First, let's consider your
data set. Of the ~903,000 calls in your initial data set almost half
were excluded from the analysis for a variety of reasons. Whenever data
is dropped, there is the strong possibility that
what remains is a non-random (and thus biased) set of data. Furthermore,
the remaining data points "do not measure crime" (as belatedly stated
in the 30th inch of the story) -but instead capture a wide variety of
incidents (including "enforcement of traffic
laws" and "attend at collisions" that are not necessarily linked to the
residents of that region). It should go without saying that if your
data does not contain variables are relevant to the question, then the
conclusions drawn from them will be suspect.
Using this questionable data
set, the conclusion "the poorer the zone, the more often people call
police and the more time police spend there, responding to distress" is
drawn, without any thought of potentially confounding
effects. There are potentially dozens of other factors besides average
household income that differ between the patrol zones that may be
ultimately responsible for the observed patterns. For instance, a
cursory search on Google Maps seems to indicate that the
regions with the highest frequencies of calls to the police also have a
greater density of Tim Hortons locations - but you would not (hopefully)
conclude that their presence is responsible for "where trouble lives".
Generations of statisticians
have warned that "correlation does not imply causation", but that
message seems to have been ignored in the construction of this article,
to the detriment of your readership.
Tristan A.F. Long
*The title for this post is taken from one of the hyperbolic statements made in the article. I think that, ironically, this statement is an apt description of the statistics used in the analysis.