‘Revolutionary’ algorithms underestimate risk to patients

Machine learning algorithms hailed as game changers in healthcare can significantly underestimate the level of risk to patients, according to University of Manchester researchers.

The study, which compared 12 families of popular machine learning models to three standard statistical models for predicting a person’s risk of suffering a heart attack or stroke, is published in the British Medical Journal .

The researchers used heart attack and stroke as a case study, but argue most machine learning algorithms which estimate clinical risk are likely to encounter similar problems.

It is second time algorithms have been under fire in recent months: algorithms used by Ofqual to force-down A level results down similarly resulted in excessive lowering of grades.

And like the ill-fated A Level algorithms, the numbers generated by the models seem to be robust at a population level, but not at individual levels.

Recently, Machine Learning models have gained considerable popularity: the English NHS invested £250 million to further embed machine learning in health care.

Currently, GPs use a standard statistical tool (QRISK) to identify if their patients have a 10 per cent or greater 10-year risk of developing CVD. Those who do should be prescribed statins.

The study found that the predicted risks for the same patients were very different between Machine Learning models and QRISK, particularly for patients with higher risks.

Also, different Machine Learning models gave different predictions. Unlike QRISK, many Machine Learning algorithms are not able to take what statisticians call ‘censoring’ into account: patients move around, skewing the calculations downwards.

And of the 223,815 patients with a heart attack or stroke risk of greater than 7.5% with QRISK, 57.8% would be reclassified below 7.5% when using Machine Learning models.

“Patients will commonly drop out GP practices for a variety of reasons, but few Machine Learning algorithms build that into their modelling for large datasets,”, said co-author Professor Tjeerd Pieter van Staa.

“Even if a patient has been registered at a practice for a few months, the algorithms will treat it as 10 year’s-worth of data- resulting in a strong underestimation of clinical risk.

“And not only do they underestimate risk, there was a wide variance between the numbers, making it hard to see which model could be used by GPs for deciding treatments.”

He added: “Machine learning may be helpful in other areas of healthcare – such as imaging.

“But in terms of predicting risk we think a lot more work needs to be done before this technology can be used safely in the clinical setting.

“Perhaps the claims that Machine Learning will revolutionise healthcare are a little premature.”

The team tested the algorithms on 3.6 million patients from the Clinical Practice Research Datalink GOLD registered at 391 general practices in England from January 1998 to December 2018.

/Public Release. View in full here.