Distribution of trial registry numbers within full-text of PubMed Central articles: implications for linking trials to publications and indexing trial publication types

Abstract Background Linking registered clinical trials with their published results continues to be a challenge. A variety of natural language processing (NLP)-based and machine learning-based models have been developed to assist users in identifying these connections. To date, however, no system ha...

Full description

Saved in:
Bibliographic Details
Main Authors: Arthur M. Holt, Ang Michael Troy, Neil R. Smalheiser
Format: Article
Language:English
Published: BMC 2025-01-01
Series:Trials
Subjects:
Online Access:https://doi.org/10.1186/s13063-025-08741-w
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Background Linking registered clinical trials with their published results continues to be a challenge. A variety of natural language processing (NLP)-based and machine learning-based models have been developed to assist users in identifying these connections. To date, however, no system has attempted to detect mentions of registry numbers within the full-text of articles. Methods Articles from the PubMed Central full-text Open Access dataset were scanned for mentions of ClinicalTrials.gov and international clinical trial registry identifiers. We analyzed the distribution of trial registry numbers within sections of the articles and characterized their publication type indexing and other metrics. Results Registry numbers mentioned in article metadata (e.g., the abstract) or in the Methods section of full-text are highly predictive of clinical trial articles. When a clinical trial article mentioned ClinicalTrials.gov identifier numbers (NCT) only in the Methods section, in every case examined, it was reporting clinical outcomes from that registered trial, and thus can reliably be used to link that trial to that publication. Conversely, registry numbers mentioned in Tables arise almost entirely from reviews (including systematic reviews and meta-analyses). Registry numbers mentioned in other full-text sections have relatively little predictive value for linking trials to their publications. Clinical trial articles that mention CONSORT or SPIRIT guidelines have a higher rate of mentioning registry numbers in article metadata, and hence are more easily linked to their underlying trials, than articles overall. Conclusions The appearance and location of trial registry numbers within the full-text of biomedical articles provide valuable features for connecting clinical trials to their publications. They also potentially provide information to assist automated tools in assigning publication types to articles.
ISSN:1745-6215