Image First or Text First? Optimising the Sequencing of Modalities in Large Language Model Prompting and Reasoning Tasks
Our study investigates how the sequencing of text and image inputs within multi-modal prompts affects the reasoning performance of Large Language Models (LLMs). Through empirical evaluations of three major commercial LLM vendors—OpenAI, Google, and Anthropic—alongside a user study on interaction str...
Saved in:
| Main Authors: | Grant Wardle, Teo Sušnjak |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-06-01
|
| Series: | Big Data and Cognitive Computing |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2504-2289/9/6/149 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
Dual-Layer Fusion Knowledge Reasoning with Enhanced Multi-modal Features
by: JING Boxiang, WANG Hairong, WANG Tong, YANG Zhenye
Published: (2025-02-01) -
The Evolution of Generative AI: Trends and Applications
by: Maria Trigka, et al.
Published: (2025-01-01) -
Synergy-CLIP: Extending CLIP With Multi-Modal Integration for Robust Representation Learning
by: Sangyeon Cho, et al.
Published: (2025-01-01) -
MPVT: An Efficient Multi-Modal Prompt Vision Tracker for Visual Target Tracking
by: Jianyu Xie, et al.
Published: (2025-07-01) -
NuCap: A Numerically Aware Captioning Framework for Improved Numerical Reasoning
by: Yuna Jeong, et al.
Published: (2025-05-01)