Electronic health records contain critical information for both medical providers and patients. But these records also contain information that could interfere with an artificial intelligence algorithm’s ability to predict patients’ risk for future disease.
Researchers at the College of Information Sciences and Technology are aiming to eliminate some of this noise – or unnecessary data – through a new machine learning model. Called LSAN, the deep neural network uses a two-pronged approach to scan electronic health record data and identify information that could predict the patient’s risk for developing a target disease in the future.
“Say we want to predict whether a patient will suffer from diabetes in the future,” said Ma. “We will use the patient’s historical data, which in some ways may be related to diabetes, such as high blood pressure or heart failure, and those are risk factors for the target disease.”
He continued, “But there are also some unrelated diagnosis codes or symptoms, and that noise information for the patient is what we want the model to remove and will give them a lower weight.”
Electronic health records use a two-level hierarchical structure to capture the medical journey of a patient using International Classification of Diseases (ICD) codes. The hierarchy begins with the patient, followed by the chronological sequence of visits. Then within each visit, ICD codes are stored with the recorded symptoms for that visit.
LSAN, which stands for “Long-term Dependencies and Short-term Correlations with Hierarchical Attention Network for Risk Prediction,” uses a hierarchical attention module (HAM) to highlight the ranked order of electronic health record data and give different weights to the ICD codes according to their relevance to a target disease. Then, a temporal aggregation module (TAM) analyzes long-term dependencies, such as how each visit relates to others in a patient’s complete medical journey, and short-term correlations, such as how every visit relates to others in a short time period.
“A patient that has diabetes may sometimes visit the doctor and ask questions about the treatment of diabetes,” said Fenglong Ma, assistant professor of information sciences and technology and principal investigator. “But sometimes they may go to the doctor because they have a cold. So we want to remove the randomness of the noise data from the electronic health data.”
He added, “”The model can output the probability of whether the patient will suffer some disease. For some patients, they may not have clear symptoms at the current time, but the model can make a prediction.”
The researchers conducted experiments on three real-world electronic health record datasets of patients with heart failure, kidney disease and dementia, and found that LSAN outperformed existing state-of-the-art shallow methods and deep learning models.
Ma collaborated with Muchao Ye and Junyu Luo, first-year doctoral students of informatics at Penn State; and Cao Xiao, director of machine learning at IQVIA, which provided the datasets. They presented their findings at the Conference on Information and Knowledge Management in October. The work was funded by seed grants from the College of Information Sciences and Technology and the Penn State Institute for Computational and Data Sciences.