Pre-training on high-resolution X-ray images: an experimental study
Abstract Existing X-ray image based pre-trained vision models are typically trained on a relatively small-scale dataset (less than 500,000 samples) with limited resolution (e.g., 224 × 224 $224 \times 224$ ). However, the key to the success of self-supervised pre-training of large models lies in mas...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-05-01
|
| Series: | Visual Intelligence |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s44267-025-00080-3 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Existing X-ray image based pre-trained vision models are typically trained on a relatively small-scale dataset (less than 500,000 samples) with limited resolution (e.g., 224 × 224 $224 \times 224$ ). However, the key to the success of self-supervised pre-training of large models lies in massive training data, and the maintenance of high-resolution X-ray images contributes to effective solutions for some challenging diseases. In this paper, we proposed a high-resolution ( 1280 × 1280 $1280 \times 1280$ ) X-ray image based pre-trained baseline model on our newly collected large-scale dataset containing more than 1 million X-ray images. Our model employs the masked auto-encoder framework, wherein the tokens that have been processed with a high rate are used as input, and the masked image patches are reconstructed by means of the Transformer encoder-decoder network. More importantly, a novel context-aware masking strategy has been introduced. This strategy utilizes the breast contour as a boundary for adaptive masking operations. We validate the effectiveness of our model through its application in two downstream tasks, namely X-ray report generation and disease detection. Extensive experiments demonstrate that our pre-trained medical baseline model can achieve comparable to, or even exceed, those of current state-of-the-art models on downstream benchmark datasets. |
|---|---|
| ISSN: | 2097-3330 2731-9008 |