Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review

Abstract Background Early detection and diagnosis of cancer are vital to improving outcomes for patients. Artificial intelligence (AI) models have shown promise in the early detection and diagnosis of cancer, but there is limited evidence on methods that fully exploit the longitudinal data stored wi...

Full description

Saved in:

Bibliographic Details
Main Authors:	Victoria Moglia, Owen Johnson, Gordon Cook, Marc de Kamps, Lesley Smith
Format:	Article
Language:	English
Published:	BMC 2025-01-01
Series:	BMC Medical Research Methodology
Subjects:	Machine learning Health data Longitudinal data Cancer Time-series Temporal
Online Access:	https://doi.org/10.1186/s12874-025-02473-w
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832571611143733248
author	Victoria Moglia Owen Johnson Gordon Cook Marc de Kamps Lesley Smith
author_facet	Victoria Moglia Owen Johnson Gordon Cook Marc de Kamps Lesley Smith
author_sort	Victoria Moglia
collection	DOAJ
description	Abstract Background Early detection and diagnosis of cancer are vital to improving outcomes for patients. Artificial intelligence (AI) models have shown promise in the early detection and diagnosis of cancer, but there is limited evidence on methods that fully exploit the longitudinal data stored within electronic health records (EHRs). This review aims to summarise methods currently utilised for prediction of cancer from longitudinal data and provides recommendations on how such models should be developed. Methods The review was conducted following PRISMA-ScR guidance. Six databases (MEDLINE, EMBASE, Web of Science, IEEE Xplore, PubMed and SCOPUS) were searched for relevant records published before 2/2/2024. Search terms related to the concepts “artificial intelligence”, “prediction”, “health records”, “longitudinal”, and “cancer”. Data were extracted relating to several areas of the articles: (1) publication details, (2) study characteristics, (3) input data, (4) model characteristics, (4) reproducibility, and (5) quality assessment using the PROBAST tool. Models were evaluated against a framework for terminology relating to reporting of cancer detection and risk prediction models. Results Of 653 records screened, 33 were included in the review; 10 predicted risk of cancer, 18 performed either cancer detection or early detection, 4 predicted recurrence, and 1 predicted metastasis. The most common cancers predicted in the studies were colorectal (n = 9) and pancreatic cancer (n = 9). 16 studies used feature engineering to represent temporal data, with the most common features representing trends. 18 used deep learning models which take a direct sequential input, most commonly recurrent neural networks, but also including convolutional neural networks and transformers. Prediction windows and lead times varied greatly between studies, even for models predicting the same cancer. High risk of bias was found in 90% of the studies. This risk was often introduced due to inappropriate study design (n = 26) and sample size (n = 26). Conclusion This review highlights the breadth of approaches to cancer prediction from longitudinal data. We identify areas where reporting of methods could be improved, particularly regarding where in a patients’ trajectory the model is applied. The review shows opportunities for further work, including comparison of these approaches and their applications in other cancers.
format	Article
id	doaj-art-bb08375ab2044fb787acea11d0cfbf7e
institution	Kabale University
issn	1471-2288
language	English
publishDate	2025-01-01
publisher	BMC
record_format	Article
series	BMC Medical Research Methodology
spelling	doaj-art-bb08375ab2044fb787acea11d0cfbf7e2025-02-02T12:30:21ZengBMCBMC Medical Research Methodology1471-22882025-01-0125111710.1186/s12874-025-02473-wArtificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping reviewVictoria Moglia0Owen Johnson1Gordon Cook2Marc de Kamps3Lesley Smith4School of Computing, University of LeedsSchool of Computing, University of LeedsLeeds Institute of Clinical Trials Research, University of LeedsSchool of Computing, University of LeedsLeeds Institute of Clinical Trials Research, University of LeedsAbstract Background Early detection and diagnosis of cancer are vital to improving outcomes for patients. Artificial intelligence (AI) models have shown promise in the early detection and diagnosis of cancer, but there is limited evidence on methods that fully exploit the longitudinal data stored within electronic health records (EHRs). This review aims to summarise methods currently utilised for prediction of cancer from longitudinal data and provides recommendations on how such models should be developed. Methods The review was conducted following PRISMA-ScR guidance. Six databases (MEDLINE, EMBASE, Web of Science, IEEE Xplore, PubMed and SCOPUS) were searched for relevant records published before 2/2/2024. Search terms related to the concepts “artificial intelligence”, “prediction”, “health records”, “longitudinal”, and “cancer”. Data were extracted relating to several areas of the articles: (1) publication details, (2) study characteristics, (3) input data, (4) model characteristics, (4) reproducibility, and (5) quality assessment using the PROBAST tool. Models were evaluated against a framework for terminology relating to reporting of cancer detection and risk prediction models. Results Of 653 records screened, 33 were included in the review; 10 predicted risk of cancer, 18 performed either cancer detection or early detection, 4 predicted recurrence, and 1 predicted metastasis. The most common cancers predicted in the studies were colorectal (n = 9) and pancreatic cancer (n = 9). 16 studies used feature engineering to represent temporal data, with the most common features representing trends. 18 used deep learning models which take a direct sequential input, most commonly recurrent neural networks, but also including convolutional neural networks and transformers. Prediction windows and lead times varied greatly between studies, even for models predicting the same cancer. High risk of bias was found in 90% of the studies. This risk was often introduced due to inappropriate study design (n = 26) and sample size (n = 26). Conclusion This review highlights the breadth of approaches to cancer prediction from longitudinal data. We identify areas where reporting of methods could be improved, particularly regarding where in a patients’ trajectory the model is applied. The review shows opportunities for further work, including comparison of these approaches and their applications in other cancers.https://doi.org/10.1186/s12874-025-02473-wMachine learningHealth dataLongitudinal dataCancerTime-seriesTemporal
spellingShingle	Victoria Moglia Owen Johnson Gordon Cook Marc de Kamps Lesley Smith Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review BMC Medical Research Methodology Machine learning Health data Longitudinal data Cancer Time-series Temporal
title	Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review
title_full	Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review
title_fullStr	Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review
title_full_unstemmed	Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review
title_short	Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review
title_sort	artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer a scoping review
topic	Machine learning Health data Longitudinal data Cancer Time-series Temporal
url	https://doi.org/10.1186/s12874-025-02473-w
work_keys_str_mv	AT victoriamoglia artificialintelligencemethodsappliedtolongitudinaldatafromelectronichealthrecordsforpredictionofcancerascopingreview AT owenjohnson artificialintelligencemethodsappliedtolongitudinaldatafromelectronichealthrecordsforpredictionofcancerascopingreview AT gordoncook artificialintelligencemethodsappliedtolongitudinaldatafromelectronichealthrecordsforpredictionofcancerascopingreview AT marcdekamps artificialintelligencemethodsappliedtolongitudinaldatafromelectronichealthrecordsforpredictionofcancerascopingreview AT lesleysmith artificialintelligencemethodsappliedtolongitudinaldatafromelectronichealthrecordsforpredictionofcancerascopingreview

Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review

Similar Items