VisualSAF-A Novel Framework for Visual Semantic Analysis Tasks

We introduce VisualSAF, a novel Visual Semantic Analysis Framework designed to enhance the understanding of contextual characteristics in Visual Scene Analysis (VSA) tasks. The framework leverages semantic variables extracted using machine learning algorithms to provide additional high-level informa...

Full description

Saved in:

Bibliographic Details
Main Authors:	Antonio V. A. Lundgren, Byron L. D. Bezerra, Carmelo J. A. Bastos-Filho
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Visual semantic analysis computer vision assistive robotics
Online Access:	https://ieeexplore.ieee.org/document/10855394/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832540544886112256
author	Antonio V. A. Lundgren Byron L. D. Bezerra Carmelo J. A. Bastos-Filho
author_facet	Antonio V. A. Lundgren Byron L. D. Bezerra Carmelo J. A. Bastos-Filho
author_sort	Antonio V. A. Lundgren
collection	DOAJ
description	We introduce VisualSAF, a novel Visual Semantic Analysis Framework designed to enhance the understanding of contextual characteristics in Visual Scene Analysis (VSA) tasks. The framework leverages semantic variables extracted using machine learning algorithms to provide additional high-level information, augmenting the capabilities of the primary task model. Comprising three main components – the General DL Model, Semantic Variables, and Output Branches – VisualSAF offers a modular and adaptable approach to addressing diverse VSA tasks. The General DL Model processes input images, extracting high-level features through a backbone network and detecting regions of interest. Semantic Variables are then extracted from these regions, incorporating a wide range of contextual information tailored to specific scenarios. Finally, the Output Branch integrates semantic variables and detections, generating high-level task information while allowing for flexible weighting of inputs to optimize task performance. The framework is demonstrated through experiments on the HOD Dataset, showcasing improvements in mean average precision and mean average recall compared to baseline models; the improvements are 0.05 in both mAP and 0.01 in mAR compared to the baseline. Future research directions include exploring multiple semantic variables, developing more complex output heads, and investigating the framework’s performance across context-shifting datasets.
format	Article
id	doaj-art-9918ea2e80314f06aeaf343805a50962
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-9918ea2e80314f06aeaf343805a509622025-02-05T00:01:08ZengIEEEIEEE Access2169-35362025-01-0113210522106310.1109/ACCESS.2025.353531410855394VisualSAF-A Novel Framework for Visual Semantic Analysis TasksAntonio V. A. Lundgren0https://orcid.org/0000-0001-5567-4414Byron L. D. Bezerra1Carmelo J. A. Bastos-Filho2https://orcid.org/0000-0002-0924-5341Department of Computer Engineering (Ecomp), Polytechnic School of Pernambuco (POLI), University of Pernambuco (UPE), Recife, BrazilDepartment of Computer Engineering (Ecomp), Polytechnic School of Pernambuco (POLI), University of Pernambuco (UPE), Recife, BrazilDepartment of Computer Engineering (Ecomp), Polytechnic School of Pernambuco (POLI), University of Pernambuco (UPE), Recife, BrazilWe introduce VisualSAF, a novel Visual Semantic Analysis Framework designed to enhance the understanding of contextual characteristics in Visual Scene Analysis (VSA) tasks. The framework leverages semantic variables extracted using machine learning algorithms to provide additional high-level information, augmenting the capabilities of the primary task model. Comprising three main components – the General DL Model, Semantic Variables, and Output Branches – VisualSAF offers a modular and adaptable approach to addressing diverse VSA tasks. The General DL Model processes input images, extracting high-level features through a backbone network and detecting regions of interest. Semantic Variables are then extracted from these regions, incorporating a wide range of contextual information tailored to specific scenarios. Finally, the Output Branch integrates semantic variables and detections, generating high-level task information while allowing for flexible weighting of inputs to optimize task performance. The framework is demonstrated through experiments on the HOD Dataset, showcasing improvements in mean average precision and mean average recall compared to baseline models; the improvements are 0.05 in both mAP and 0.01 in mAR compared to the baseline. Future research directions include exploring multiple semantic variables, developing more complex output heads, and investigating the framework’s performance across context-shifting datasets.https://ieeexplore.ieee.org/document/10855394/Visual semantic analysiscomputer visionassistive robotics
spellingShingle	Antonio V. A. Lundgren Byron L. D. Bezerra Carmelo J. A. Bastos-Filho VisualSAF-A Novel Framework for Visual Semantic Analysis Tasks IEEE Access Visual semantic analysis computer vision assistive robotics
title	VisualSAF-A Novel Framework for Visual Semantic Analysis Tasks
title_full	VisualSAF-A Novel Framework for Visual Semantic Analysis Tasks
title_fullStr	VisualSAF-A Novel Framework for Visual Semantic Analysis Tasks
title_full_unstemmed	VisualSAF-A Novel Framework for Visual Semantic Analysis Tasks
title_short	VisualSAF-A Novel Framework for Visual Semantic Analysis Tasks
title_sort	visualsaf a novel framework for visual semantic analysis tasks
topic	Visual semantic analysis computer vision assistive robotics
url	https://ieeexplore.ieee.org/document/10855394/
work_keys_str_mv	AT antoniovalundgren visualsafanovelframeworkforvisualsemanticanalysistasks AT byronldbezerra visualsafanovelframeworkforvisualsemanticanalysistasks AT carmelojabastosfilho visualsafanovelframeworkforvisualsemanticanalysistasks

VisualSAF-A Novel Framework for Visual Semantic Analysis Tasks

Similar Items