High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network

Multistage text-to-image generation algorithms have shown remarkable success. However, the images produced often lack detail and suffer from feature loss. This is because these methods mainly focus on extracting features from images and text, using only conventional residual blocks for post-extracti...

Full description

Saved in:
Bibliographic Details
Main Authors: Wei-Yen Hsu, Jing-Wen Lin
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/2/706
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832589233982799872
author Wei-Yen Hsu
Jing-Wen Lin
author_facet Wei-Yen Hsu
Jing-Wen Lin
author_sort Wei-Yen Hsu
collection DOAJ
description Multistage text-to-image generation algorithms have shown remarkable success. However, the images produced often lack detail and suffer from feature loss. This is because these methods mainly focus on extracting features from images and text, using only conventional residual blocks for post-extraction feature processing. This results in the loss of features, greatly reducing the quality of the generated images and necessitating more resources for feature calculation, which will severely limit the use and application of optical devices such as cameras and smartphones. To address these issues, the novel High-Detail Feature-Preserving Network (HDFpNet) is proposed to effectively generate high-quality, near-realistic images from text descriptions. The initial text-to-image generation (iT2IG) module is used to generate initial feature maps to avoid feature loss. Next, the fast excitation-and-squeeze feature extraction (FESFE) module is proposed to recursively generate high-detail and feature-preserving images with lower computational costs through three steps: channel excitation (CE), fast feature extraction (FFE), and channel squeeze (CS). Finally, the channel attention (CA) mechanism further enriches the feature details. Compared with the state of the art, experimental results obtained on the CUB-Bird and MS-COCO datasets demonstrate that the proposed HDFpNet achieves better performance and visual presentation, especially regarding high-detail images and feature preservation.
format Article
id doaj-art-09b5caaa309f43c982b94a9b7ad73d3f
institution Kabale University
issn 2076-3417
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-09b5caaa309f43c982b94a9b7ad73d3f2025-01-24T13:20:32ZengMDPI AGApplied Sciences2076-34172025-01-0115270610.3390/app15020706High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving NetworkWei-Yen Hsu0Jing-Wen Lin1Department of Information Management, National Chung Cheng University, Chiayi 62102, TaiwanDepartment of Information Management, National Chung Cheng University, Chiayi 62102, TaiwanMultistage text-to-image generation algorithms have shown remarkable success. However, the images produced often lack detail and suffer from feature loss. This is because these methods mainly focus on extracting features from images and text, using only conventional residual blocks for post-extraction feature processing. This results in the loss of features, greatly reducing the quality of the generated images and necessitating more resources for feature calculation, which will severely limit the use and application of optical devices such as cameras and smartphones. To address these issues, the novel High-Detail Feature-Preserving Network (HDFpNet) is proposed to effectively generate high-quality, near-realistic images from text descriptions. The initial text-to-image generation (iT2IG) module is used to generate initial feature maps to avoid feature loss. Next, the fast excitation-and-squeeze feature extraction (FESFE) module is proposed to recursively generate high-detail and feature-preserving images with lower computational costs through three steps: channel excitation (CE), fast feature extraction (FFE), and channel squeeze (CS). Finally, the channel attention (CA) mechanism further enriches the feature details. Compared with the state of the art, experimental results obtained on the CUB-Bird and MS-COCO datasets demonstrate that the proposed HDFpNet achieves better performance and visual presentation, especially regarding high-detail images and feature preservation.https://www.mdpi.com/2076-3417/15/2/706generative adversarial networktext-to-image generationhigh detailfeature preservation
spellingShingle Wei-Yen Hsu
Jing-Wen Lin
High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network
Applied Sciences
generative adversarial network
text-to-image generation
high detail
feature preservation
title High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network
title_full High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network
title_fullStr High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network
title_full_unstemmed High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network
title_short High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network
title_sort high quality text to image generation using high detail feature preserving network
topic generative adversarial network
text-to-image generation
high detail
feature preservation
url https://www.mdpi.com/2076-3417/15/2/706
work_keys_str_mv AT weiyenhsu highqualitytexttoimagegenerationusinghighdetailfeaturepreservingnetwork
AT jingwenlin highqualitytexttoimagegenerationusinghighdetailfeaturepreservingnetwork