Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models

Text-to-image models have demonstrated remarkable progress in generating visual content from textual descriptions. However, the presence of linguistic ambiguity in the text prompts poses a potential challenge to these models, possibly leading to undesired or inaccurate outputs. This work conducts a...

Full description

Saved in:

Bibliographic Details
Main Authors:	Wala Elsharif, Mahmood Alzubaidi, James She, Marco Agus
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Computers
Subjects:	natural language processing computational linguistics linguistic ambiguity text-to-image models diffusion models prompt engineering
Online Access:	https://www.mdpi.com/2073-431X/14/1/19
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832588756043956224
author	Wala Elsharif Mahmood Alzubaidi James She Marco Agus
author_facet	Wala Elsharif Mahmood Alzubaidi James She Marco Agus
author_sort	Wala Elsharif
collection	DOAJ
description	Text-to-image models have demonstrated remarkable progress in generating visual content from textual descriptions. However, the presence of linguistic ambiguity in the text prompts poses a potential challenge to these models, possibly leading to undesired or inaccurate outputs. This work conducts a preliminary study and provides insights into how text-to-image diffusion models resolve linguistic ambiguity through a series of experiments. We investigate a set of prompts that exhibit different types of linguistic ambiguities with different models and the images they generate, focusing on how the models’ interpretations of linguistic ambiguity compare to those of humans. In addition, we present a curated dataset of ambiguous prompts and their corresponding images known as the Visual Linguistic Ambiguity Benchmark (V-LAB) dataset. Furthermore, we report a number of limitations and failure modes caused by linguistic ambiguity in text-to-image models and propose prompt engineering guidelines to minimize the impact of ambiguity. The findings of this exploratory study contribute to the ongoing improvement of text-to-image models and provide valuable insights for future advancements in the field.
format	Article
id	doaj-art-39409707d08e421989766aafe81f25a2
institution	Kabale University
issn	2073-431X
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Computers
spelling	doaj-art-39409707d08e421989766aafe81f25a22025-01-24T13:27:53ZengMDPI AGComputers2073-431X2025-01-011411910.3390/computers14010019Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image ModelsWala Elsharif0Mahmood Alzubaidi1James She2Marco Agus3College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, QatarCollege of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, QatarDepartment of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong 999077, ChinaCollege of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, QatarText-to-image models have demonstrated remarkable progress in generating visual content from textual descriptions. However, the presence of linguistic ambiguity in the text prompts poses a potential challenge to these models, possibly leading to undesired or inaccurate outputs. This work conducts a preliminary study and provides insights into how text-to-image diffusion models resolve linguistic ambiguity through a series of experiments. We investigate a set of prompts that exhibit different types of linguistic ambiguities with different models and the images they generate, focusing on how the models’ interpretations of linguistic ambiguity compare to those of humans. In addition, we present a curated dataset of ambiguous prompts and their corresponding images known as the Visual Linguistic Ambiguity Benchmark (V-LAB) dataset. Furthermore, we report a number of limitations and failure modes caused by linguistic ambiguity in text-to-image models and propose prompt engineering guidelines to minimize the impact of ambiguity. The findings of this exploratory study contribute to the ongoing improvement of text-to-image models and provide valuable insights for future advancements in the field.https://www.mdpi.com/2073-431X/14/1/19natural language processingcomputational linguisticslinguistic ambiguitytext-to-image modelsdiffusion modelsprompt engineering
spellingShingle	Wala Elsharif Mahmood Alzubaidi James She Marco Agus Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models Computers natural language processing computational linguistics linguistic ambiguity text-to-image models diffusion models prompt engineering
title	Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models
title_full	Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models
title_fullStr	Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models
title_full_unstemmed	Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models
title_short	Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models
title_sort	visualizing ambiguity analyzing linguistic ambiguity resolution in text to image models
topic	natural language processing computational linguistics linguistic ambiguity text-to-image models diffusion models prompt engineering
url	https://www.mdpi.com/2073-431X/14/1/19
work_keys_str_mv	AT walaelsharif visualizingambiguityanalyzinglinguisticambiguityresolutionintexttoimagemodels AT mahmoodalzubaidi visualizingambiguityanalyzinglinguisticambiguityresolutionintexttoimagemodels AT jamesshe visualizingambiguityanalyzinglinguisticambiguityresolutionintexttoimagemodels AT marcoagus visualizingambiguityanalyzinglinguisticambiguityresolutionintexttoimagemodels

Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models

Similar Items