Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models

Text-to-image models have demonstrated remarkable progress in generating visual content from textual descriptions. However, the presence of linguistic ambiguity in the text prompts poses a potential challenge to these models, possibly leading to undesired or inaccurate outputs. This work conducts a...

Full description

Saved in:
Bibliographic Details
Main Authors: Wala Elsharif, Mahmood Alzubaidi, James She, Marco Agus
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Computers
Subjects:
Online Access:https://www.mdpi.com/2073-431X/14/1/19
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Text-to-image models have demonstrated remarkable progress in generating visual content from textual descriptions. However, the presence of linguistic ambiguity in the text prompts poses a potential challenge to these models, possibly leading to undesired or inaccurate outputs. This work conducts a preliminary study and provides insights into how text-to-image diffusion models resolve linguistic ambiguity through a series of experiments. We investigate a set of prompts that exhibit different types of linguistic ambiguities with different models and the images they generate, focusing on how the models’ interpretations of linguistic ambiguity compare to those of humans. In addition, we present a curated dataset of ambiguous prompts and their corresponding images known as the Visual Linguistic Ambiguity Benchmark (V-LAB) dataset. Furthermore, we report a number of limitations and failure modes caused by linguistic ambiguity in text-to-image models and propose prompt engineering guidelines to minimize the impact of ambiguity. The findings of this exploratory study contribute to the ongoing improvement of text-to-image models and provide valuable insights for future advancements in the field.
ISSN:2073-431X