Data extraction methods for systematic review (semi)automation: Update of a living systematic review [version 3; peer review: 3 approved]

Background The reliable and usable (semi) automation of data extraction can support the field of systematic review by reducing the workload required to gather information about the conduct and results of the included studies. This living systematic review examines published approaches for data extra...

Full description

Saved in:
Bibliographic Details
Main Authors: Rebecca Elmore, Luke A. McGuinness, James Thomas, Julian P. T. Higgins, Ailbhe N. Finnerty Mutlu, Babatunde K. Olorisade, Lena Schmidt
Format: Article
Language:English
Published: F1000 Research Ltd 2025-04-01
Series:F1000Research
Subjects:
Online Access:https://f1000research.com/articles/10-401/v3
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background The reliable and usable (semi) automation of data extraction can support the field of systematic review by reducing the workload required to gather information about the conduct and results of the included studies. This living systematic review examines published approaches for data extraction from reports of clinical studies. Methods We systematically and continually search PubMed, ACL Anthology, arXiv, OpenAlex via EPPI-Reviewer, and the dblp computer science bibliography databases. Full text screening and data extraction are conducted using a mix of open-source and commercial tools. This living review update includes publications up to August 2024 and OpenAlex content up to September 2024. Results 117 publications are included in this review. Of these, 30 (26%) used full texts while the rest used titles and abstracts. A total of 112 (96%) publications developed classifiers for randomised controlled trials. Over 30 entities were extracted, with PICOs (population, intervention, comparator, outcome) being the most frequently extracted. Data are available from 53 (45%), and code from 49 (42%) publications. Nine (8%) implemented publicly available tools. Conclusions This living systematic review presents an overview of (semi)automated data-extraction literature of interest to different types of literature review. We identified a broad evidence base of publications describing data extraction for interventional reviews and a small number of publications extracting other study types. Between review updates, large language models emerged as a new tool for data extraction. While facilitating access to automated extraction, they showed a trend of decreasing quality of results reporting, especially quantitative results such as recall and lower reproducibility of results. Compared with the previous update, trends such as transition to relation extraction and sharing of code and datasets stayed similar.
ISSN:2046-1402