An analysis of tweets seems able to provide the police with useful information that can help to predict crime.

Twitter a Potentially Useful Tool for Predicting Crime?

A number of published studies set out to predict the likelihood of convicted criminals re-offending, based on information about their past, the circumstances of their imprisonment and the social environment into which they are subsequently released. Now researchers have taken a completely different tack in the search to develop a crime prediction model. Their new model is based on tweets sent by Twitter users around the world, who now number some 140 million. An algorithm analyses these tweets to predict the occurrence and location of crime in large cities. This research, led by Matthew S. Gerber at the Predictive Technology Lab at the University of Virginia, is part of a range of activities whose purpose to be able to foresee crimes. The Chicago police already have recourse to these tweet-based predictions on a daily basis. Chicago has turned out to be an ideal testing-ground, with its population of over 2.7 million, its high crime rate and, most importantly, a comprehensive, detailed database kept up to date by the Chicago Police Department. The researchers have been able to collect all the data relating to felonies and less serious ‘misdemeanours’ which took place between 1 January and 31 March 2013. They compared this information with a database of 1.5 million user geo-tagged tweets. In an article published in the journal Decision Support Systems*, Matthew Gerber demonstrates that using data from Twitter brings a noticeable improvement in crime prediction.

Space and time data

Capitalising on the popularity of the Twitter network, the research team made an analysis of older tweets to see how far they could detect various trends or outcomes – election results, outbreaks of revolution, and even natural phenomena such as earthquakes.  The article also describes the use of Kernel Density Estimation (KDE) which involves pairing a historical crime record with a geographic location and using a probability function to calculate the likelihood of future crimes occurring in that area. When it comes to predicting crime, KDE has the advantage of rapidly identifying and visualising high-risk zones on the basis of past crimes committed there. Initially, Gerber did not believe that adding Twitter data would be able to improve on the KDE results. He was worried that the inherent characteristics of tweets – abbreviated, highly personal language, messages limited to 140 characters, etc – would make them unusable in a predictive model. In the end however, the researchers realised that although tweets were unlikely to explicitly detail the planning of a crime, messages which make reference to crimes can often be spotted.  For example, when a number of messages refer to quantities of alcohol being consumed in a given area, the model will alert those responsible of the probability that this gathering might degenerate into a situation where crimes are committed. It transpired that combining Twitter data with results obtained by other means does tend to improve the rate of crime prediction. Out of 25 types of crime studied by the researchers, tweets were able to add useful information in 19 of the categories.

Police use of social networks arouses controversy

In the introduction to his article, Matthew S. Gerber explains how this type of predictive analytics could be useful for those who have to make decisions on how to use their budgets in the fight against crime, a key example being how police patrols are organised.  The researcher is nonetheless aware of the issues raised when social networks are used in a judicial context. As long ago as 2012, advocates of data privacy criticised a similar initiative by the FBI to use Twitter data to predict crime. And quite recently the Chicago police have been accused of racism for the way they use crime prediction models. Their statistics-based model does make use of data on race, which led to a young man of 22 receiving an unexpected visit from the police, who warned him that he was under surveillance and cautioned him against committing any further crimes. However Gerber dismisses the risk of any such abuses with Twitter-based analysis, pointing out that his algorithm does not target particular individuals, but is confined to gathering data that has been publicly posted. In the article he suggests important areas of future work for this research with a view to improving his model, including a deeper semantic analysis of message content, more refined temporal modelling and the incorporation of auxiliary data sources.


*Under the title of ‘Predicting Crime using Twitter and Kernel Density Estimation’, in Vol. 61 of ‘Decision Support Systems’ (publ. Elsevier), 2014, pp. 115-125.

By Lucie Frontière