Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models
Text-to-image models have demonstrated remarkable progress in generating visual content from textual descriptions. However, the presence of linguistic ambiguity in the text prompts poses a potential challenge to these models, possibly leading to undesired or inaccurate outputs. This work conducts a...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-01-01
|
Series: | Computers |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-431X/14/1/19 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832588756043956224 |
---|---|
author | Wala Elsharif Mahmood Alzubaidi James She Marco Agus |
author_facet | Wala Elsharif Mahmood Alzubaidi James She Marco Agus |
author_sort | Wala Elsharif |
collection | DOAJ |
description | Text-to-image models have demonstrated remarkable progress in generating visual content from textual descriptions. However, the presence of linguistic ambiguity in the text prompts poses a potential challenge to these models, possibly leading to undesired or inaccurate outputs. This work conducts a preliminary study and provides insights into how text-to-image diffusion models resolve linguistic ambiguity through a series of experiments. We investigate a set of prompts that exhibit different types of linguistic ambiguities with different models and the images they generate, focusing on how the models’ interpretations of linguistic ambiguity compare to those of humans. In addition, we present a curated dataset of ambiguous prompts and their corresponding images known as the Visual Linguistic Ambiguity Benchmark (V-LAB) dataset. Furthermore, we report a number of limitations and failure modes caused by linguistic ambiguity in text-to-image models and propose prompt engineering guidelines to minimize the impact of ambiguity. The findings of this exploratory study contribute to the ongoing improvement of text-to-image models and provide valuable insights for future advancements in the field. |
format | Article |
id | doaj-art-39409707d08e421989766aafe81f25a2 |
institution | Kabale University |
issn | 2073-431X |
language | English |
publishDate | 2025-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Computers |
spelling | doaj-art-39409707d08e421989766aafe81f25a22025-01-24T13:27:53ZengMDPI AGComputers2073-431X2025-01-011411910.3390/computers14010019Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image ModelsWala Elsharif0Mahmood Alzubaidi1James She2Marco Agus3College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, QatarCollege of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, QatarDepartment of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong 999077, ChinaCollege of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, QatarText-to-image models have demonstrated remarkable progress in generating visual content from textual descriptions. However, the presence of linguistic ambiguity in the text prompts poses a potential challenge to these models, possibly leading to undesired or inaccurate outputs. This work conducts a preliminary study and provides insights into how text-to-image diffusion models resolve linguistic ambiguity through a series of experiments. We investigate a set of prompts that exhibit different types of linguistic ambiguities with different models and the images they generate, focusing on how the models’ interpretations of linguistic ambiguity compare to those of humans. In addition, we present a curated dataset of ambiguous prompts and their corresponding images known as the Visual Linguistic Ambiguity Benchmark (V-LAB) dataset. Furthermore, we report a number of limitations and failure modes caused by linguistic ambiguity in text-to-image models and propose prompt engineering guidelines to minimize the impact of ambiguity. The findings of this exploratory study contribute to the ongoing improvement of text-to-image models and provide valuable insights for future advancements in the field.https://www.mdpi.com/2073-431X/14/1/19natural language processingcomputational linguisticslinguistic ambiguitytext-to-image modelsdiffusion modelsprompt engineering |
spellingShingle | Wala Elsharif Mahmood Alzubaidi James She Marco Agus Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models Computers natural language processing computational linguistics linguistic ambiguity text-to-image models diffusion models prompt engineering |
title | Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models |
title_full | Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models |
title_fullStr | Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models |
title_full_unstemmed | Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models |
title_short | Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models |
title_sort | visualizing ambiguity analyzing linguistic ambiguity resolution in text to image models |
topic | natural language processing computational linguistics linguistic ambiguity text-to-image models diffusion models prompt engineering |
url | https://www.mdpi.com/2073-431X/14/1/19 |
work_keys_str_mv | AT walaelsharif visualizingambiguityanalyzinglinguisticambiguityresolutionintexttoimagemodels AT mahmoodalzubaidi visualizingambiguityanalyzinglinguisticambiguityresolutionintexttoimagemodels AT jamesshe visualizingambiguityanalyzinglinguisticambiguityresolutionintexttoimagemodels AT marcoagus visualizingambiguityanalyzinglinguisticambiguityresolutionintexttoimagemodels |