Multi-task feature integration and interactive active learning for scene image resizing

Abstract In the realm of artificial intelligence (AI), recomposing the semantic segments of intricate scenes is pivotal. This study attempts to seamlessly combine multi-channel perceptual visual features for the adaptive retargeting of images characterized by complex spatial configurations. The key...

Full description

Saved in:
Bibliographic Details
Main Authors: Ludan Shi, Xianhua Yan, Sen Wang
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-98917-w
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract In the realm of artificial intelligence (AI), recomposing the semantic segments of intricate scenes is pivotal. This study attempts to seamlessly combine multi-channel perceptual visual features for the adaptive retargeting of images characterized by complex spatial configurations. The key of our approach is the formulation of an in-depth hierarchical model dedicated to the precise capture of human gaze dynamics. Utilizing the BING objectness metric, we swiftly and accurately acquire patches within scenes that hold semantic and visual significance by identifying objects and their components across varying scales. Subsequent to this, we introduce a multi-task feature selector for the dynamic integration of multi-channel features across disparate scene patches. To capture human perception in recognizing critical scenic patches, we introduce a strategy known as locality-preserved and interactive active learning (LIAL). This technique incrementally crafts gaze shift paths (GSP) for each scene. The primary advantages of LIAL are twofold: firstly, it maintains the local coherence of varied scenes efficiently, and secondly, it allows for the active selection process to be shaped by human interaction. By employing LIAL, we methodically represent a GSP for every scene and calculate its corresponding deep features by a multi-layer aggregating algorithm. The deeply-learned GSP representations are subsequently encoded to a Gaussian mixture model (GMM), serving as the basis for scenic image retargeting. Our empirical analyses affirm the effectiveness of our proposed methodology. Statistics of our designed user study showed that our retargeting outperforms the five counterparts. Besides, compared to other 17 popular visual recognizers, our method’s precision exceeds the second best performer by 3%, and the testing time consumption is only 49.8% of the second best performer.
ISSN:2045-2322