Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models

Text-to-image models have demonstrated remarkable progress in generating visual content from textual descriptions. However, the presence of linguistic ambiguity in the text prompts poses a potential challenge to these models, possibly leading to undesired or inaccurate outputs. This work conducts a...

Full description

Saved in:
Bibliographic Details
Main Authors: Wala Elsharif, Mahmood Alzubaidi, James She, Marco Agus
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Computers
Subjects:
Online Access:https://www.mdpi.com/2073-431X/14/1/19
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832588756043956224
author Wala Elsharif
Mahmood Alzubaidi
James She
Marco Agus
author_facet Wala Elsharif
Mahmood Alzubaidi
James She
Marco Agus
author_sort Wala Elsharif
collection DOAJ
description Text-to-image models have demonstrated remarkable progress in generating visual content from textual descriptions. However, the presence of linguistic ambiguity in the text prompts poses a potential challenge to these models, possibly leading to undesired or inaccurate outputs. This work conducts a preliminary study and provides insights into how text-to-image diffusion models resolve linguistic ambiguity through a series of experiments. We investigate a set of prompts that exhibit different types of linguistic ambiguities with different models and the images they generate, focusing on how the models’ interpretations of linguistic ambiguity compare to those of humans. In addition, we present a curated dataset of ambiguous prompts and their corresponding images known as the Visual Linguistic Ambiguity Benchmark (V-LAB) dataset. Furthermore, we report a number of limitations and failure modes caused by linguistic ambiguity in text-to-image models and propose prompt engineering guidelines to minimize the impact of ambiguity. The findings of this exploratory study contribute to the ongoing improvement of text-to-image models and provide valuable insights for future advancements in the field.
format Article
id doaj-art-39409707d08e421989766aafe81f25a2
institution Kabale University
issn 2073-431X
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Computers
spelling doaj-art-39409707d08e421989766aafe81f25a22025-01-24T13:27:53ZengMDPI AGComputers2073-431X2025-01-011411910.3390/computers14010019Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image ModelsWala Elsharif0Mahmood Alzubaidi1James She2Marco Agus3College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, QatarCollege of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, QatarDepartment of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong 999077, ChinaCollege of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, QatarText-to-image models have demonstrated remarkable progress in generating visual content from textual descriptions. However, the presence of linguistic ambiguity in the text prompts poses a potential challenge to these models, possibly leading to undesired or inaccurate outputs. This work conducts a preliminary study and provides insights into how text-to-image diffusion models resolve linguistic ambiguity through a series of experiments. We investigate a set of prompts that exhibit different types of linguistic ambiguities with different models and the images they generate, focusing on how the models’ interpretations of linguistic ambiguity compare to those of humans. In addition, we present a curated dataset of ambiguous prompts and their corresponding images known as the Visual Linguistic Ambiguity Benchmark (V-LAB) dataset. Furthermore, we report a number of limitations and failure modes caused by linguistic ambiguity in text-to-image models and propose prompt engineering guidelines to minimize the impact of ambiguity. The findings of this exploratory study contribute to the ongoing improvement of text-to-image models and provide valuable insights for future advancements in the field.https://www.mdpi.com/2073-431X/14/1/19natural language processingcomputational linguisticslinguistic ambiguitytext-to-image modelsdiffusion modelsprompt engineering
spellingShingle Wala Elsharif
Mahmood Alzubaidi
James She
Marco Agus
Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models
Computers
natural language processing
computational linguistics
linguistic ambiguity
text-to-image models
diffusion models
prompt engineering
title Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models
title_full Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models
title_fullStr Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models
title_full_unstemmed Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models
title_short Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models
title_sort visualizing ambiguity analyzing linguistic ambiguity resolution in text to image models
topic natural language processing
computational linguistics
linguistic ambiguity
text-to-image models
diffusion models
prompt engineering
url https://www.mdpi.com/2073-431X/14/1/19
work_keys_str_mv AT walaelsharif visualizingambiguityanalyzinglinguisticambiguityresolutionintexttoimagemodels
AT mahmoodalzubaidi visualizingambiguityanalyzinglinguisticambiguityresolutionintexttoimagemodels
AT jamesshe visualizingambiguityanalyzinglinguisticambiguityresolutionintexttoimagemodels
AT marcoagus visualizingambiguityanalyzinglinguisticambiguityresolutionintexttoimagemodels