Predicting the location of housing crimes - the power of machine learning and property licensing

What if we could predict the time and location of environmental health crimes?

This would be a regulator’s dream not unlike the 2002 fantasy Spielberg film Minority Report, in which a specialised police department apprehends criminals based on foreknowledge provided by psychics. Is that fantasy starting to become a reality?

Our co-founder Russell Moffatt and Dr Mark Gardener who is a data analytics expert, describing how we are helping councils to predict the location of housing crimes in combination with large scale property licensing.

“All models are wrong, but some are useful (George E. P. Box)”

Russell writes:

In 2012, during the set-up of Newham's ground-breaking borough wide licensing scheme, we were challenged with identifying 10,000 unlicensed and unsafe private rented properties hidden amongst 105,000 residential properties spread across 14 square miles.

By that point in the Newham property licensing project we had established during a pilot that unlicensed properties were much more likely to have poor housing standards (Category 1 hazards, HHSRS). Failure to license a rented property in Newham was also a criminal offence. So, find the unlicensed properties and we would find housing crimes and properties in substandard condition, simple.

Problem

All we needed was sound intelligence on residential tenure for every property in the borough. The big problem was councils do not hold accurate data on tenure.

The traditional solution to this type of problem was to dispatch a small army of council officers to walk the streets, spotting properties that "looked" like they might be rented, through signs of disrepair, overflowing bins and even the "dirty curtain test". The problem with this technique is it’s costly and unreliable, particularly across a whole borough. This raised the question, could we identify tenure predictors in council data? After all, councils hold lots of data on all properties often linked to a Unique Property Reference Number (UPRN).

Potential solution

The experiment started by matching a small number of property level ‘factors’ to create a simple property warehouse. Over time more and more datasets were added, including council tax, housing benefit, electoral register and complaint records to name a few. After a few iterations we were managing spreadsheets with 8 million cells of data and more.

While wading through the spreadsheet, tenure trends started to appear in the data which enabled some crude tenure predictions. It soon became clear that it would not be practical to mentally compute 100,000+ property tenure predictions. This was the light bulb moment; would it be possible to develop computer models to make fast and reliable predictions instead?

At this point the project needed some specialist academic support.

Mark writes:

I was brought in by Russell to help develop computer aided intelligence to help model property tenure. A particular requirement was to help identify properties that were in need of regulatory intervention. I had heard about some of the work Newham Council was doing to tackle slum housing and was keen to support the project.

The task was to develop a system that allowed prediction of tenure, given the data held by the council. There are several stages in producing computer models of this sort:

Identify the outcome you want to achieve.
Obtain suitable data to help the modelling process.
Check and "tidy up" the input data.
Develop a modelling method.
Run the model and identify possible "predictor" variables.
Check the model for "accuracy".
Refine and revise the model using new and updated data.

Note that the process is not entirely linear, particularly in the later stages. New data is being acquired all the time and this generally means your model can benefit. A model is not a fixed entity and can be revised and altered to reflect changes in the data.

Obtaining data

Councils hold a huge amount of data about properties. The large quantity of data is a double-edged sword, on one hand lots of data helps the "predictive capacity" of a model, on the other hand there are more data to check!

An important component of predictive modelling is in preparing the data. Computers are literal entities and database entries such as "Yes", "yes", "Y" and "y" are regarded as four separate items. Often a considerable period of time is required to "tidy", validate and prepare the data.

Developing a model

We used R: the statistical programming language, for model development. It is powerful and flexible and eminently suitable for the task. It is also free and Open Source.

There are various "kinds" of regression modelling and so it is important to identify what you are trying to achieve. We realised that the important question was "is a property privately rented?" This meant that we would use a method called Logistic Regression, which is a form of generalised linear model using binomial data. In other words, our potential property tenure can be regarded as privately rented or not.

The starting point for a predictive model of this kind is data where the outcome is known. In other words, we already know which properties are HMO. The task is to develop a model that contains the most "useful" variables and allows prediction for data where the outcome is unknown.

Creating a regression model is a subtle mixture of maths and experience. It is possible to use AIC values to "pick" the best available predictive variables from the pool of data. The approach maximises the predictive capacity of the model. New terms (predictive variables) can be added to the model to improve its overall "performance". However, the computer doesn't know anything about the private rental sector, it deals only with maths and statistical likelihood. Some variables may be potentially "very good" as predictors in theory but in practice hard to obtain or unreliable in some other way. This is where the skill and experience of the practitioner comes into play. Russell was able to exercise control over which variables were used in the final model, thus producing a model that was most useful.

Revising and refining a model

The basic model "results" show "how likely" each property is to be a particular tenure. Of course all models are wrong, so the next step is to see how "reliable" the model is under various circumstances. A basic generalised linear model (GLM) returns a D2 statistic, which is a measure of the proportion of the deviation that is explained by the model. This is a summary guide and is especially useful in helping to determine the "final cut", which is how many of the explanatory variables you keep in the final model.

There are diminishing returns of predictive capacity for each additional variable added. In general, the aim is to balance peak predictive power with the number of variables used. Models with fewer variables tend to perform better in their predictions when used on new (previously unseen) data. Thus the revising and refining process is important in that it helps give “best value” and produces more reliable results.

Russell concludes:

The tenure intelligence (Ti) approach described above was a game changer. It solved a important strategic problem and helped unlock the potential of property licensing.

It enabled my team to be much more productive by minimising wasted time surveying and visiting the wrong properties. In just a couple of years Newnham’s private housing enforcement outputs went from 25 prosecutions to 280+ prosecutions per year, a 10-fold increase. In 2016 this represented 67% of all private housing enforcement undertaken in London. Ti was the catalyst to enable this important leap forward.

Ti is now being used by a range of councils across the country to help improve understanding of the rented sector at a policy level and upgrade the quality of regulatory interventions.

It's important to note this approach requires property licensing to be successful. Licensing provides a 'behavioural wedge' to separate good landlords from bad. It also offers a much-improved legal framework for regulators to operate in.

Mark concludes:

A predictive model is a tool, and in this case a tool that gives "intelligence" to frontline practitioners and policy managers. The model allows traditional follow-up methods to be applied more efficiently and effectively. It is a computer version of the "dirty curtain" test, that doesn't rely on a small army of council officers.

About the Authors

Russell Moffatt B.Sc. MPH CEnvH, is co-founder of Metastreet. He is a Chartered Environmental Health Practitioner with more than 20 years of experience working in some of the most challenging London Boroughs. He qualified with a B.Sc. (Hons) Environmental Health at Greenwich University in 2002 and achieved his Masters in Public Health at King’s College, London in 2006.

Mark Gardener PhD, is founder of DataAnalytics.org.uk. He was originally an ecologist but now works in more general areas too, providing training courses and data/project workshops as well as wring textbooks on R. He gained his doctorate in pollination ecology in 2001 and has undertaken research work and teaching in the UK and around the world.

Author:

Russel Moffatt

Chartered EHP and Co-founder of Metastreet

Published:

1st April 2020

Copied!

Interested? Read our latest articles.

Fusion of Next Generation Technology and Sector Expert Knowledge

By Russell Moffatt 15th October 2020

Why a Fusion of Next Generation Technology and Sector Expert Knowledge are Set to Revolutionise Local Government Public Protection Services.

Read this article >

Behind the scenes – our product development process

By Sam Rickett 22nd April 2020

My name is Sam Rickett, and I’m part of the engineering team here at Metastreet. This year, we’re going to share content coming from across the wider Metastreet team, so you can see how some of the other functions come together to support our business.

Read this article >

Property licensing – what are the disbenefits?

By Russell Moffatt 9th May 2019

Large scale property licensing is still relatively new, however over the last 10 years it has developed into one of the main regulatory tools for the private rented sector (PRS) in out large towns and cities.

Read this article >

Councils need more government support to tackle rogue landlords

By Russell Moffatt 5th November 2018

The Guardian and ITV News recently uncovered how criminal landlords are exploiting legal loopholes to continue operating, even though they have been convicted and found not fit and proper.

Read this article >

Tenure Intelligence (Ti) - Combining artificial intelligence and housing data

By Russell Moffatt 4th October 2018

Multinational technology companies are not the only ones using big data and computing power to make predictions about the unknown. Councils have seen the benefits of machine learning and are now starting to adopt this approach to make public services more effective and productive.

Read this article >

The building blocks of effective private housing multi-agency enforcement

By Russell Moffatt 19th June 2018

A minority of landlords across the UK continue to commit housing crimes and expose tenants to life threatening hazards and poor housing conditions.

Read this article >

What are the success factors for property licensing schemes?

By Russell Moffatt 17th May 2018

Lots has been said and written about property licensing as a policy intervention, however there has been little discussion on why some schemes are perceived as a success and why others have struggled.

Read this article >

Where have all the Housing EHOs gone?

By Russell Moffatt 30th April 2018

There’s simply nowhere near enough qualified housing enforcers employed by councils to deal with growing levels of poor housing conditions in the private rented sector (PRS).

Read this article >

Does property licensing improve property standards?

By Russell Moffatt 5th April 2018

I’m often asked, does property licensing improve property standards? For most Environmental Health Practitioners regulating PRS standards the answer is simple, yes absolutely!

Read this article >

Property licensing is a vital housing tool, but it must be made easier for all

By Russell Moffatt 12th February 2018

Property licensing is the best tool available to housing authorities to tackle criminal landlords and drive up PRS standards, however it must be made easier for all.

Read this article >