Pre-training on high-resolution X-ray images: an experimental study

Abstract Existing X-ray image based pre-trained vision models are typically trained on a relatively small-scale dataset (less than 500,000 samples) with limited resolution (e.g., 224 × 224 $224 \times 224$ ). However, the key to the success of self-supervised pre-training of large models lies in mas...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xiao Wang, Yuehang Li, Wentao Wu, Jiandong Jin, Yao Rong, Bo Jiang, Chuanfu Li, Jin Tang
Format:	Article
Language:	English
Published:	Springer 2025-05-01
Series:	Visual Intelligence
Subjects:	High-resolution X-ray image Pre-trained big models Masked auto-encoder (MAE) Medical report generation
Online Access:	https://doi.org/10.1007/s44267-025-00080-3
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Abstract Existing X-ray image based pre-trained vision models are typically trained on a relatively small-scale dataset (less than 500,000 samples) with limited resolution (e.g., 224 × 224 $224 \times 224$ ). However, the key to the success of self-supervised pre-training of large models lies in massive training data, and the maintenance of high-resolution X-ray images contributes to effective solutions for some challenging diseases. In this paper, we proposed a high-resolution ( 1280 × 1280 $1280 \times 1280$ ) X-ray image based pre-trained baseline model on our newly collected large-scale dataset containing more than 1 million X-ray images. Our model employs the masked auto-encoder framework, wherein the tokens that have been processed with a high rate are used as input, and the masked image patches are reconstructed by means of the Transformer encoder-decoder network. More importantly, a novel context-aware masking strategy has been introduced. This strategy utilizes the breast contour as a boundary for adaptive masking operations. We validate the effectiveness of our model through its application in two downstream tasks, namely X-ray report generation and disease detection. Extensive experiments demonstrate that our pre-trained medical baseline model can achieve comparable to, or even exceed, those of current state-of-the-art models on downstream benchmark datasets.
ISSN:	2097-3330 2731-9008

Pre-training on high-resolution X-ray images: an experimental study

Similar Items