Dual-Stream Contrastive Learning for Medical Visual Representations Using Synthetic Images Generated by Latent Diffusion Model

Deep learning-based medical image processing methods can enhance diagnostic accuracy while significantly accelerating clinical decision workflows. However, in order to learn better visual representations, such approaches usually need substantial amount of expert-annotated data, which are highly cost...

Full description

Saved in:
Bibliographic Details
Main Authors: Weitao Ye, Longfu Zhang, Xiaoben Jiang, Dawei Yang, Yu Zhu
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11088093/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849245909891678208
author Weitao Ye
Longfu Zhang
Xiaoben Jiang
Dawei Yang
Yu Zhu
author_facet Weitao Ye
Longfu Zhang
Xiaoben Jiang
Dawei Yang
Yu Zhu
author_sort Weitao Ye
collection DOAJ
description Deep learning-based medical image processing methods can enhance diagnostic accuracy while significantly accelerating clinical decision workflows. However, in order to learn better visual representations, such approaches usually need substantial amount of expert-annotated data, which are highly costly. To address this issue, we propose a novel approach called Dual-Stream Contrastive Learning with Cross-Scale Token Projection (DCL-CsTP), which aims to enhance visual representations and transferable initializations. Specifically, a latent diffusion model (LDM) is leveraged to generate high-quality synthetic medical images in order to expand the dataset. Then we utilize the proposed dual-stream architecture that consists of a global semantic relations stream and a local detail relations stream to learn discriminative medical image representations from the dataset. Furthermore, a cross-scale token projection is designed to enable the model to capture various scales of focus in medical images. Comprehensive experiments are performed on two downstream tasks: medical image classification and segmentation. For multi-classification of pneumonia, our DCL-CsTP method achieves 95.90% accuracy. For lesions segmentation, our DCL-CsTP method attains 89.73% dice coefficient on the International Skin Imaging Collaboration 2018 (ISIC 2018) dataset and 82.50% dice coefficient on the Kvasir-SEG dataset. The performance superiority of the model pre-trained by DCL-CsTP is conclusively demonstrated through the above experiments on various dataset, which shows that DCL-CsTP can enhance diagnostic precision and alleviate radiologists’ image screening burdens.
format Article
id doaj-art-01ac7aa6579d4b028c9b0198ffc6ec67
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-01ac7aa6579d4b028c9b0198ffc6ec672025-08-20T03:58:40ZengIEEEIEEE Access2169-35362025-01-011312964812965810.1109/ACCESS.2025.359154411088093Dual-Stream Contrastive Learning for Medical Visual Representations Using Synthetic Images Generated by Latent Diffusion ModelWeitao Ye0https://orcid.org/0009-0002-3627-5121Longfu Zhang1Xiaoben Jiang2Dawei Yang3https://orcid.org/0000-0002-8928-143XYu Zhu4https://orcid.org/0000-0003-1535-6520School of Information Science and Engineering, East China University of Science and Technology, Shanghai, ChinaDepartment of Pulmonary and Critical Care Medicine, Shanghai Xuhui Central Hospital, Zhongshan-Xuhui Hospital, Fudan University, Shanghai, ChinaSchool of Information Science and Engineering, East China University of Science and Technology, Shanghai, ChinaDepartment of Pulmonary and Critical Care Medicine, Zhongshan Hospital (Xiamen), Fudan University, Huli District, Xiamen, Fujian, ChinaSchool of Information Science and Engineering, East China University of Science and Technology, Shanghai, ChinaDeep learning-based medical image processing methods can enhance diagnostic accuracy while significantly accelerating clinical decision workflows. However, in order to learn better visual representations, such approaches usually need substantial amount of expert-annotated data, which are highly costly. To address this issue, we propose a novel approach called Dual-Stream Contrastive Learning with Cross-Scale Token Projection (DCL-CsTP), which aims to enhance visual representations and transferable initializations. Specifically, a latent diffusion model (LDM) is leveraged to generate high-quality synthetic medical images in order to expand the dataset. Then we utilize the proposed dual-stream architecture that consists of a global semantic relations stream and a local detail relations stream to learn discriminative medical image representations from the dataset. Furthermore, a cross-scale token projection is designed to enable the model to capture various scales of focus in medical images. Comprehensive experiments are performed on two downstream tasks: medical image classification and segmentation. For multi-classification of pneumonia, our DCL-CsTP method achieves 95.90% accuracy. For lesions segmentation, our DCL-CsTP method attains 89.73% dice coefficient on the International Skin Imaging Collaboration 2018 (ISIC 2018) dataset and 82.50% dice coefficient on the Kvasir-SEG dataset. The performance superiority of the model pre-trained by DCL-CsTP is conclusively demonstrated through the above experiments on various dataset, which shows that DCL-CsTP can enhance diagnostic precision and alleviate radiologists’ image screening burdens.https://ieeexplore.ieee.org/document/11088093/Contrastive learningcross-scale token projectiondual-streamlatent diffusion modelmedical visual representations
spellingShingle Weitao Ye
Longfu Zhang
Xiaoben Jiang
Dawei Yang
Yu Zhu
Dual-Stream Contrastive Learning for Medical Visual Representations Using Synthetic Images Generated by Latent Diffusion Model
IEEE Access
Contrastive learning
cross-scale token projection
dual-stream
latent diffusion model
medical visual representations
title Dual-Stream Contrastive Learning for Medical Visual Representations Using Synthetic Images Generated by Latent Diffusion Model
title_full Dual-Stream Contrastive Learning for Medical Visual Representations Using Synthetic Images Generated by Latent Diffusion Model
title_fullStr Dual-Stream Contrastive Learning for Medical Visual Representations Using Synthetic Images Generated by Latent Diffusion Model
title_full_unstemmed Dual-Stream Contrastive Learning for Medical Visual Representations Using Synthetic Images Generated by Latent Diffusion Model
title_short Dual-Stream Contrastive Learning for Medical Visual Representations Using Synthetic Images Generated by Latent Diffusion Model
title_sort dual stream contrastive learning for medical visual representations using synthetic images generated by latent diffusion model
topic Contrastive learning
cross-scale token projection
dual-stream
latent diffusion model
medical visual representations
url https://ieeexplore.ieee.org/document/11088093/
work_keys_str_mv AT weitaoye dualstreamcontrastivelearningformedicalvisualrepresentationsusingsyntheticimagesgeneratedbylatentdiffusionmodel
AT longfuzhang dualstreamcontrastivelearningformedicalvisualrepresentationsusingsyntheticimagesgeneratedbylatentdiffusionmodel
AT xiaobenjiang dualstreamcontrastivelearningformedicalvisualrepresentationsusingsyntheticimagesgeneratedbylatentdiffusionmodel
AT daweiyang dualstreamcontrastivelearningformedicalvisualrepresentationsusingsyntheticimagesgeneratedbylatentdiffusionmodel
AT yuzhu dualstreamcontrastivelearningformedicalvisualrepresentationsusingsyntheticimagesgeneratedbylatentdiffusionmodel