High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network

Multistage text-to-image generation algorithms have shown remarkable success. However, the images produced often lack detail and suffer from feature loss. This is because these methods mainly focus on extracting features from images and text, using only conventional residual blocks for post-extracti...

Full description

Saved in:

Bibliographic Details
Main Authors:	Wei-Yen Hsu, Jing-Wen Lin
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Applied Sciences
Subjects:	generative adversarial network text-to-image generation high detail feature preservation
Online Access:	https://www.mdpi.com/2076-3417/15/2/706
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832589233982799872
author	Wei-Yen Hsu Jing-Wen Lin
author_facet	Wei-Yen Hsu Jing-Wen Lin
author_sort	Wei-Yen Hsu
collection	DOAJ
description	Multistage text-to-image generation algorithms have shown remarkable success. However, the images produced often lack detail and suffer from feature loss. This is because these methods mainly focus on extracting features from images and text, using only conventional residual blocks for post-extraction feature processing. This results in the loss of features, greatly reducing the quality of the generated images and necessitating more resources for feature calculation, which will severely limit the use and application of optical devices such as cameras and smartphones. To address these issues, the novel High-Detail Feature-Preserving Network (HDFpNet) is proposed to effectively generate high-quality, near-realistic images from text descriptions. The initial text-to-image generation (iT2IG) module is used to generate initial feature maps to avoid feature loss. Next, the fast excitation-and-squeeze feature extraction (FESFE) module is proposed to recursively generate high-detail and feature-preserving images with lower computational costs through three steps: channel excitation (CE), fast feature extraction (FFE), and channel squeeze (CS). Finally, the channel attention (CA) mechanism further enriches the feature details. Compared with the state of the art, experimental results obtained on the CUB-Bird and MS-COCO datasets demonstrate that the proposed HDFpNet achieves better performance and visual presentation, especially regarding high-detail images and feature preservation.
format	Article
id	doaj-art-09b5caaa309f43c982b94a9b7ad73d3f
institution	Kabale University
issn	2076-3417
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj-art-09b5caaa309f43c982b94a9b7ad73d3f2025-01-24T13:20:32ZengMDPI AGApplied Sciences2076-34172025-01-0115270610.3390/app15020706High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving NetworkWei-Yen Hsu0Jing-Wen Lin1Department of Information Management, National Chung Cheng University, Chiayi 62102, TaiwanDepartment of Information Management, National Chung Cheng University, Chiayi 62102, TaiwanMultistage text-to-image generation algorithms have shown remarkable success. However, the images produced often lack detail and suffer from feature loss. This is because these methods mainly focus on extracting features from images and text, using only conventional residual blocks for post-extraction feature processing. This results in the loss of features, greatly reducing the quality of the generated images and necessitating more resources for feature calculation, which will severely limit the use and application of optical devices such as cameras and smartphones. To address these issues, the novel High-Detail Feature-Preserving Network (HDFpNet) is proposed to effectively generate high-quality, near-realistic images from text descriptions. The initial text-to-image generation (iT2IG) module is used to generate initial feature maps to avoid feature loss. Next, the fast excitation-and-squeeze feature extraction (FESFE) module is proposed to recursively generate high-detail and feature-preserving images with lower computational costs through three steps: channel excitation (CE), fast feature extraction (FFE), and channel squeeze (CS). Finally, the channel attention (CA) mechanism further enriches the feature details. Compared with the state of the art, experimental results obtained on the CUB-Bird and MS-COCO datasets demonstrate that the proposed HDFpNet achieves better performance and visual presentation, especially regarding high-detail images and feature preservation.https://www.mdpi.com/2076-3417/15/2/706generative adversarial networktext-to-image generationhigh detailfeature preservation
spellingShingle	Wei-Yen Hsu Jing-Wen Lin High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network Applied Sciences generative adversarial network text-to-image generation high detail feature preservation
title	High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network
title_full	High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network
title_fullStr	High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network
title_full_unstemmed	High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network
title_short	High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network
title_sort	high quality text to image generation using high detail feature preserving network
topic	generative adversarial network text-to-image generation high detail feature preservation
url	https://www.mdpi.com/2076-3417/15/2/706
work_keys_str_mv	AT weiyenhsu highqualitytexttoimagegenerationusinghighdetailfeaturepreservingnetwork AT jingwenlin highqualitytexttoimagegenerationusinghighdetailfeaturepreservingnetwork

High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network

Similar Items