High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network
Multistage text-to-image generation algorithms have shown remarkable success. However, the images produced often lack detail and suffer from feature loss. This is because these methods mainly focus on extracting features from images and text, using only conventional residual blocks for post-extracti...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-01-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/15/2/706 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832589233982799872 |
---|---|
author | Wei-Yen Hsu Jing-Wen Lin |
author_facet | Wei-Yen Hsu Jing-Wen Lin |
author_sort | Wei-Yen Hsu |
collection | DOAJ |
description | Multistage text-to-image generation algorithms have shown remarkable success. However, the images produced often lack detail and suffer from feature loss. This is because these methods mainly focus on extracting features from images and text, using only conventional residual blocks for post-extraction feature processing. This results in the loss of features, greatly reducing the quality of the generated images and necessitating more resources for feature calculation, which will severely limit the use and application of optical devices such as cameras and smartphones. To address these issues, the novel High-Detail Feature-Preserving Network (HDFpNet) is proposed to effectively generate high-quality, near-realistic images from text descriptions. The initial text-to-image generation (iT2IG) module is used to generate initial feature maps to avoid feature loss. Next, the fast excitation-and-squeeze feature extraction (FESFE) module is proposed to recursively generate high-detail and feature-preserving images with lower computational costs through three steps: channel excitation (CE), fast feature extraction (FFE), and channel squeeze (CS). Finally, the channel attention (CA) mechanism further enriches the feature details. Compared with the state of the art, experimental results obtained on the CUB-Bird and MS-COCO datasets demonstrate that the proposed HDFpNet achieves better performance and visual presentation, especially regarding high-detail images and feature preservation. |
format | Article |
id | doaj-art-09b5caaa309f43c982b94a9b7ad73d3f |
institution | Kabale University |
issn | 2076-3417 |
language | English |
publishDate | 2025-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj-art-09b5caaa309f43c982b94a9b7ad73d3f2025-01-24T13:20:32ZengMDPI AGApplied Sciences2076-34172025-01-0115270610.3390/app15020706High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving NetworkWei-Yen Hsu0Jing-Wen Lin1Department of Information Management, National Chung Cheng University, Chiayi 62102, TaiwanDepartment of Information Management, National Chung Cheng University, Chiayi 62102, TaiwanMultistage text-to-image generation algorithms have shown remarkable success. However, the images produced often lack detail and suffer from feature loss. This is because these methods mainly focus on extracting features from images and text, using only conventional residual blocks for post-extraction feature processing. This results in the loss of features, greatly reducing the quality of the generated images and necessitating more resources for feature calculation, which will severely limit the use and application of optical devices such as cameras and smartphones. To address these issues, the novel High-Detail Feature-Preserving Network (HDFpNet) is proposed to effectively generate high-quality, near-realistic images from text descriptions. The initial text-to-image generation (iT2IG) module is used to generate initial feature maps to avoid feature loss. Next, the fast excitation-and-squeeze feature extraction (FESFE) module is proposed to recursively generate high-detail and feature-preserving images with lower computational costs through three steps: channel excitation (CE), fast feature extraction (FFE), and channel squeeze (CS). Finally, the channel attention (CA) mechanism further enriches the feature details. Compared with the state of the art, experimental results obtained on the CUB-Bird and MS-COCO datasets demonstrate that the proposed HDFpNet achieves better performance and visual presentation, especially regarding high-detail images and feature preservation.https://www.mdpi.com/2076-3417/15/2/706generative adversarial networktext-to-image generationhigh detailfeature preservation |
spellingShingle | Wei-Yen Hsu Jing-Wen Lin High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network Applied Sciences generative adversarial network text-to-image generation high detail feature preservation |
title | High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network |
title_full | High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network |
title_fullStr | High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network |
title_full_unstemmed | High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network |
title_short | High-Quality Text-to-Image Generation Using High-Detail Feature-Preserving Network |
title_sort | high quality text to image generation using high detail feature preserving network |
topic | generative adversarial network text-to-image generation high detail feature preservation |
url | https://www.mdpi.com/2076-3417/15/2/706 |
work_keys_str_mv | AT weiyenhsu highqualitytexttoimagegenerationusinghighdetailfeaturepreservingnetwork AT jingwenlin highqualitytexttoimagegenerationusinghighdetailfeaturepreservingnetwork |