Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review

Abstract Background Early detection and diagnosis of cancer are vital to improving outcomes for patients. Artificial intelligence (AI) models have shown promise in the early detection and diagnosis of cancer, but there is limited evidence on methods that fully exploit the longitudinal data stored wi...

Full description

Saved in:
Bibliographic Details
Main Authors: Victoria Moglia, Owen Johnson, Gordon Cook, Marc de Kamps, Lesley Smith
Format: Article
Language:English
Published: BMC 2025-01-01
Series:BMC Medical Research Methodology
Subjects:
Online Access:https://doi.org/10.1186/s12874-025-02473-w
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832571611143733248
author Victoria Moglia
Owen Johnson
Gordon Cook
Marc de Kamps
Lesley Smith
author_facet Victoria Moglia
Owen Johnson
Gordon Cook
Marc de Kamps
Lesley Smith
author_sort Victoria Moglia
collection DOAJ
description Abstract Background Early detection and diagnosis of cancer are vital to improving outcomes for patients. Artificial intelligence (AI) models have shown promise in the early detection and diagnosis of cancer, but there is limited evidence on methods that fully exploit the longitudinal data stored within electronic health records (EHRs). This review aims to summarise methods currently utilised for prediction of cancer from longitudinal data and provides recommendations on how such models should be developed. Methods The review was conducted following PRISMA-ScR guidance. Six databases (MEDLINE, EMBASE, Web of Science, IEEE Xplore, PubMed and SCOPUS) were searched for relevant records published before 2/2/2024. Search terms related to the concepts “artificial intelligence”, “prediction”, “health records”, “longitudinal”, and “cancer”. Data were extracted relating to several areas of the articles: (1) publication details, (2) study characteristics, (3) input data, (4) model characteristics, (4) reproducibility, and (5) quality assessment using the PROBAST tool. Models were evaluated against a framework for terminology relating to reporting of cancer detection and risk prediction models. Results Of 653 records screened, 33 were included in the review; 10 predicted risk of cancer, 18 performed either cancer detection or early detection, 4 predicted recurrence, and 1 predicted metastasis. The most common cancers predicted in the studies were colorectal (n = 9) and pancreatic cancer (n = 9). 16 studies used feature engineering to represent temporal data, with the most common features representing trends. 18 used deep learning models which take a direct sequential input, most commonly recurrent neural networks, but also including convolutional neural networks and transformers. Prediction windows and lead times varied greatly between studies, even for models predicting the same cancer. High risk of bias was found in 90% of the studies. This risk was often introduced due to inappropriate study design (n = 26) and sample size (n = 26). Conclusion This review highlights the breadth of approaches to cancer prediction from longitudinal data. We identify areas where reporting of methods could be improved, particularly regarding where in a patients’ trajectory the model is applied. The review shows opportunities for further work, including comparison of these approaches and their applications in other cancers.
format Article
id doaj-art-bb08375ab2044fb787acea11d0cfbf7e
institution Kabale University
issn 1471-2288
language English
publishDate 2025-01-01
publisher BMC
record_format Article
series BMC Medical Research Methodology
spelling doaj-art-bb08375ab2044fb787acea11d0cfbf7e2025-02-02T12:30:21ZengBMCBMC Medical Research Methodology1471-22882025-01-0125111710.1186/s12874-025-02473-wArtificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping reviewVictoria Moglia0Owen Johnson1Gordon Cook2Marc de Kamps3Lesley Smith4School of Computing, University of LeedsSchool of Computing, University of LeedsLeeds Institute of Clinical Trials Research, University of LeedsSchool of Computing, University of LeedsLeeds Institute of Clinical Trials Research, University of LeedsAbstract Background Early detection and diagnosis of cancer are vital to improving outcomes for patients. Artificial intelligence (AI) models have shown promise in the early detection and diagnosis of cancer, but there is limited evidence on methods that fully exploit the longitudinal data stored within electronic health records (EHRs). This review aims to summarise methods currently utilised for prediction of cancer from longitudinal data and provides recommendations on how such models should be developed. Methods The review was conducted following PRISMA-ScR guidance. Six databases (MEDLINE, EMBASE, Web of Science, IEEE Xplore, PubMed and SCOPUS) were searched for relevant records published before 2/2/2024. Search terms related to the concepts “artificial intelligence”, “prediction”, “health records”, “longitudinal”, and “cancer”. Data were extracted relating to several areas of the articles: (1) publication details, (2) study characteristics, (3) input data, (4) model characteristics, (4) reproducibility, and (5) quality assessment using the PROBAST tool. Models were evaluated against a framework for terminology relating to reporting of cancer detection and risk prediction models. Results Of 653 records screened, 33 were included in the review; 10 predicted risk of cancer, 18 performed either cancer detection or early detection, 4 predicted recurrence, and 1 predicted metastasis. The most common cancers predicted in the studies were colorectal (n = 9) and pancreatic cancer (n = 9). 16 studies used feature engineering to represent temporal data, with the most common features representing trends. 18 used deep learning models which take a direct sequential input, most commonly recurrent neural networks, but also including convolutional neural networks and transformers. Prediction windows and lead times varied greatly between studies, even for models predicting the same cancer. High risk of bias was found in 90% of the studies. This risk was often introduced due to inappropriate study design (n = 26) and sample size (n = 26). Conclusion This review highlights the breadth of approaches to cancer prediction from longitudinal data. We identify areas where reporting of methods could be improved, particularly regarding where in a patients’ trajectory the model is applied. The review shows opportunities for further work, including comparison of these approaches and their applications in other cancers.https://doi.org/10.1186/s12874-025-02473-wMachine learningHealth dataLongitudinal dataCancerTime-seriesTemporal
spellingShingle Victoria Moglia
Owen Johnson
Gordon Cook
Marc de Kamps
Lesley Smith
Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review
BMC Medical Research Methodology
Machine learning
Health data
Longitudinal data
Cancer
Time-series
Temporal
title Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review
title_full Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review
title_fullStr Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review
title_full_unstemmed Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review
title_short Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review
title_sort artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer a scoping review
topic Machine learning
Health data
Longitudinal data
Cancer
Time-series
Temporal
url https://doi.org/10.1186/s12874-025-02473-w
work_keys_str_mv AT victoriamoglia artificialintelligencemethodsappliedtolongitudinaldatafromelectronichealthrecordsforpredictionofcancerascopingreview
AT owenjohnson artificialintelligencemethodsappliedtolongitudinaldatafromelectronichealthrecordsforpredictionofcancerascopingreview
AT gordoncook artificialintelligencemethodsappliedtolongitudinaldatafromelectronichealthrecordsforpredictionofcancerascopingreview
AT marcdekamps artificialintelligencemethodsappliedtolongitudinaldatafromelectronichealthrecordsforpredictionofcancerascopingreview
AT lesleysmith artificialintelligencemethodsappliedtolongitudinaldatafromelectronichealthrecordsforpredictionofcancerascopingreview