Predicting the location of housing crimes - the power of machine learning and property licensing
What if we could predict the time and location of environmental health crimes?
This would be a regulator’s dream not unlike the 2002 fantasy Spielberg film Minority Report, in which a specialised police department apprehends criminals based on foreknowledge provided by psychics. Is that fantasy starting to become a reality?
Our co-founder Russell Moffatt and Dr Mark Gardener who is a data analytics expert, describing how we are helping councils to predict the location of housing crimes in combination with large scale property licensing.
“All models are wrong, but some are useful (George E. P. Box)”
In 2012, during the set-up of Newham's ground-breaking borough wide licensing scheme, we were challenged with identifying 10,000 unlicensed and unsafe private rented properties hidden amongst 105,000 residential properties spread across 14 square miles.
By that point in the Newham property licensing project we had established during a pilot that unlicensed properties were much more likely to have poor housing standards (Category 1 hazards, HHSRS). Failure to license a rented property in Newham was also a criminal offence. So, find the unlicensed properties and we would find housing crimes and properties in substandard condition, simple.
All we needed was sound intelligence on residential tenure for every property in the borough. The big problem was councils do not hold accurate data on tenure.
The traditional solution to this type of problem was to dispatch a small army of council officers to walk the streets, spotting properties that "looked" like they might be rented, through signs of disrepair, overflowing bins and even the "dirty curtain test". The problem with this technique is it’s costly and unreliable, particularly across a whole borough. This raised the question, could we identify tenure predictors in council data? After all, councils hold lots of data on all properties often linked to a Unique Property Reference Number (UPRN).
The experiment started by matching a small number of property level ‘factors’ to create a simple property warehouse. Over time more and more datasets were added, including council tax, housing benefit, electoral register and complaint records to name a few. After a few iterations we were managing spreadsheets with 8 million cells of data and more.
While wading through the spreadsheet, tenure trends started to appear in the data which enabled some crude tenure predictions. It soon became clear that it would not be practical to mentally compute 100,000+ property tenure predictions. This was the light bulb moment; would it be possible to develop computer models to make fast and reliable predictions instead?
At this point the project needed some specialist academic support.
I was brought in by Russell to help develop computer aided intelligence to help model property tenure. A particular requirement was to help identify properties that were in need of regulatory intervention. I had heard about some of the work Newham Council was doing to tackle slum housing and was keen to support the project.
The task was to develop a system that allowed prediction of tenure, given the data held by the council. There are several stages in producing computer models of this sort:
- Identify the outcome you want to achieve.
- Obtain suitable data to help the modelling process.
- Check and "tidy up" the input data.
- Develop a modelling method.
- Run the model and identify possible "predictor" variables.
- Check the model for "accuracy".
- Refine and revise the model using new and updated data.
Note that the process is not entirely linear, particularly in the later stages. New data is being acquired all the time and this generally means your model can benefit. A model is not a fixed entity and can be revised and altered to reflect changes in the data.
Councils hold a huge amount of data about properties. The large quantity of data is a double-edged sword, on one hand lots of data helps the "predictive capacity" of a model, on the other hand there are more data to check!
An important component of predictive modelling is in preparing the data. Computers are literal entities and database entries such as "Yes", "yes", "Y" and "y" are regarded as four separate items. Often a considerable period of time is required to "tidy", validate and prepare the data.
Developing a model
We used R: the statistical programming language, for model development. It is powerful and flexible and eminently suitable for the task. It is also free and Open Source.
There are various "kinds" of regression modelling and so it is important to identify what you are trying to achieve. We realised that the important question was "is a property privately rented?" This meant that we would use a method called Logistic Regression, which is a form of generalised linear model using binomial data. In other words, our potential property tenure can be regarded as privately rented or not.
The starting point for a predictive model of this kind is data where the outcome is known. In other words, we already know which properties are HMO. The task is to develop a model that contains the most "useful" variables and allows prediction for data where the outcome is unknown.
Creating a regression model is a subtle mixture of maths and experience. It is possible to use AIC values to "pick" the best available predictive variables from the pool of data. The approach maximises the predictive capacity of the model. New terms (predictive variables) can be added to the model to improve its overall "performance". However, the computer doesn't know anything about the private rental sector, it deals only with maths and statistical likelihood. Some variables may be potentially "very good" as predictors in theory but in practice hard to obtain or unreliable in some other way. This is where the skill and experience of the practitioner comes into play. Russell was able to exercise control over which variables were used in the final model, thus producing a model that was most useful.
Revising and refining a model
The basic model "results" show "how likely" each property is to be a particular tenure. Of course all models are wrong, so the next step is to see how "reliable" the model is under various circumstances. A basic generalised linear model (GLM) returns a D2 statistic, which is a measure of the proportion of the deviation that is explained by the model. This is a summary guide and is especially useful in helping to determine the "final cut", which is how many of the explanatory variables you keep in the final model.
There are diminishing returns of predictive capacity for each additional variable added. In general, the aim is to balance peak predictive power with the number of variables used. Models with fewer variables tend to perform better in their predictions when used on new (previously unseen) data. Thus the revising and refining process is important in that it helps give “best value” and produces more reliable results.
The tenure intelligence (Ti) approach described above was a game changer. It solved a important strategic problem and helped unlock the potential of property licensing.
It enabled my team to be much more productive by minimising wasted time surveying and visiting the wrong properties. In just a couple of years Newnham’s private housing enforcement outputs went from 25 prosecutions to 280+ prosecutions per year, a 10-fold increase. In 2016 this represented 67% of all private housing enforcement undertaken in London. Ti was the catalyst to enable this important leap forward.
Ti is now being used by a range of councils across the country to help improve understanding of the rented sector at a policy level and upgrade the quality of regulatory interventions.
It's important to note this approach requires property licensing to be successful. Licensing provides a 'behavioural wedge' to separate good landlords from bad. It also offers a much-improved legal framework for regulators to operate in.
A predictive model is a tool, and in this case a tool that gives "intelligence" to frontline practitioners and policy managers. The model allows traditional follow-up methods to be applied more efficiently and effectively. It is a computer version of the "dirty curtain" test, that doesn't rely on a small army of council officers.
About the Authors
Russell Moffatt B.Sc. MPH CEnvH, is co-founder of Metastreet. He is a Chartered Environmental Health Practitioner with more than 20 years of experience working in some of the most challenging London Boroughs. He qualified with a B.Sc. (Hons) Environmental Health at Greenwich University in 2002 and achieved his Masters in Public Health at King’s College, London in 2006.
Mark Gardener PhD, is founder of DataAnalytics.org.uk. He was originally an ecologist but now works in more general areas too, providing training courses and data/project workshops as well as wring textbooks on R. He gained his doctorate in pollination ecology in 2001 and has undertaken research work and teaching in the UK and around the world.
Interested? Read our latest articles.
Behind the scenes – our product development process
Tenure Intelligence (Ti) - Combining artificial intelligence and housing data
Multinational technology companies are not the only ones using big data and computing power to make predictions about the unknown. Councils have seen the benefits of machine learning and are now starting to adopt this approach to make public services more effective and productive.Read this article >
The building blocks of effective private housing multi-agency enforcement
A minority of landlords across the UK continue to commit housing crimes and expose tenants to life threatening hazards and poor housing conditions.Read this article >
What are the success factors for property licensing schemes?
Lots has been said and written about property licensing as a policy intervention, however there has been little discussion on why some schemes are perceived as a success and why others have struggled.Read this article >
Where have all the Housing EHOs gone?
There’s simply nowhere near enough qualified housing enforcers employed by councils to deal with growing levels of poor housing conditions in the private rented sector (PRS).Read this article >
Does property licensing improve property standards?
I’m often asked, does property licensing improve property standards? For most Environmental Health Practitioners regulating PRS standards the answer is simple, yes absolutely!Read this article >