Few-shot Remote Sensing Imagery Recognition with Compositionality Inductive Bias in Hierarchical Representation Space

Remote sensing scenes from aerial perspective can be constructed by distinct visual parts in a combinatorial number of different ways. Such combinatorial explosion poses great challenges to understanding remote sensing imagery (RSI) with few prior instances (i.e., few-shot RSI recognition). Despite...

Full description

Saved in:
Bibliographic Details
Main Authors: Shichao Zhou, Zhuowei Wang, Zekai Zhang, Wenzheng Wang, Yingrui Zhao, Yunpu Zhang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10819630/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832592955221737472
author Shichao Zhou
Zhuowei Wang
Zekai Zhang
Wenzheng Wang
Yingrui Zhao
Yunpu Zhang
author_facet Shichao Zhou
Zhuowei Wang
Zekai Zhang
Wenzheng Wang
Yingrui Zhao
Yunpu Zhang
author_sort Shichao Zhou
collection DOAJ
description Remote sensing scenes from aerial perspective can be constructed by distinct visual parts in a combinatorial number of different ways. Such combinatorial explosion poses great challenges to understanding remote sensing imagery (RSI) with few prior instances (i.e., few-shot RSI recognition). Despite empirical success of existing methods such as data augmentation and knowledge transfer, no large-scale dataset can cover all possible combinations of visual parts. In this case, the prior knowledge learned from these data-driven methods may exhibit dataset bias, resulting in inadequate generalization to the current recognition task. Different from the naive data-driven strategies mentioned above, we alternatively devote to delicate feature modeling by constraining the mapping behavior of deep neural networks. Specifically, we embed inductive bias of compositionality into hierarchical latent representation space, which operates on two aspects: 1) disentangled and reusable representation. We establish a clustering-oriented factorized representation with a mixture model to represent multipart distributions of tokens. Each cluster centroid represents a re-occurring part. New patches are allocated to the nearest cluster centroid, and then we obtain the posterior representation; 2) compositional and discriminative representation. We introduce a hierarchical context prediction mechanism for compositional representation learning, utilizing a predictive NCE loss function to encourage global remote sensing scenes to accurately predict similar local parts, and thus automatically inferring compositional representations of high-level but discriminative latent concepts. Extensive experiments, including comparative experiments with SOTA, sensitivity evaluations, and ablation studies, demonstrate comparable or even superior performance of our method in few-shot RSI recognition.
format Article
id doaj-art-c1ae31ab20cb469f898e6de6639b26bb
institution Kabale University
issn 1939-1404
2151-1535
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling doaj-art-c1ae31ab20cb469f898e6de6639b26bb2025-01-21T00:00:18ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-01183544355510.1109/JSTARS.2024.352457310819630Few-shot Remote Sensing Imagery Recognition with Compositionality Inductive Bias in Hierarchical Representation SpaceShichao Zhou0Zhuowei Wang1Zekai Zhang2Wenzheng Wang3https://orcid.org/0000-0002-0278-6751Yingrui Zhao4Yunpu Zhang5School of Information and Communication Engineering, Beijing Information Science and Technology University, Beijing, ChinaSchool of Information and Communication Engineering, Beijing Information Science and Technology University, Beijing, ChinaSchool of Information and Communication Engineering, Beijing Information Science and Technology University, Beijing, ChinaSchool of Information and Electronics, Beijing Institute of Technology, Beijing, ChinaSchool of Information and Communication Engineering, Beijing Information Science and Technology University, Beijing, ChinaSchool of Information and Communication Engineering, Beijing Information Science and Technology University, Beijing, ChinaRemote sensing scenes from aerial perspective can be constructed by distinct visual parts in a combinatorial number of different ways. Such combinatorial explosion poses great challenges to understanding remote sensing imagery (RSI) with few prior instances (i.e., few-shot RSI recognition). Despite empirical success of existing methods such as data augmentation and knowledge transfer, no large-scale dataset can cover all possible combinations of visual parts. In this case, the prior knowledge learned from these data-driven methods may exhibit dataset bias, resulting in inadequate generalization to the current recognition task. Different from the naive data-driven strategies mentioned above, we alternatively devote to delicate feature modeling by constraining the mapping behavior of deep neural networks. Specifically, we embed inductive bias of compositionality into hierarchical latent representation space, which operates on two aspects: 1) disentangled and reusable representation. We establish a clustering-oriented factorized representation with a mixture model to represent multipart distributions of tokens. Each cluster centroid represents a re-occurring part. New patches are allocated to the nearest cluster centroid, and then we obtain the posterior representation; 2) compositional and discriminative representation. We introduce a hierarchical context prediction mechanism for compositional representation learning, utilizing a predictive NCE loss function to encourage global remote sensing scenes to accurately predict similar local parts, and thus automatically inferring compositional representations of high-level but discriminative latent concepts. Extensive experiments, including comparative experiments with SOTA, sensitivity evaluations, and ablation studies, demonstrate comparable or even superior performance of our method in few-shot RSI recognition.https://ieeexplore.ieee.org/document/10819630/Clustering methodsfeature extractionknowledge representationprediction methodsremote sensing
spellingShingle Shichao Zhou
Zhuowei Wang
Zekai Zhang
Wenzheng Wang
Yingrui Zhao
Yunpu Zhang
Few-shot Remote Sensing Imagery Recognition with Compositionality Inductive Bias in Hierarchical Representation Space
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Clustering methods
feature extraction
knowledge representation
prediction methods
remote sensing
title Few-shot Remote Sensing Imagery Recognition with Compositionality Inductive Bias in Hierarchical Representation Space
title_full Few-shot Remote Sensing Imagery Recognition with Compositionality Inductive Bias in Hierarchical Representation Space
title_fullStr Few-shot Remote Sensing Imagery Recognition with Compositionality Inductive Bias in Hierarchical Representation Space
title_full_unstemmed Few-shot Remote Sensing Imagery Recognition with Compositionality Inductive Bias in Hierarchical Representation Space
title_short Few-shot Remote Sensing Imagery Recognition with Compositionality Inductive Bias in Hierarchical Representation Space
title_sort few shot remote sensing imagery recognition with compositionality inductive bias in hierarchical representation space
topic Clustering methods
feature extraction
knowledge representation
prediction methods
remote sensing
url https://ieeexplore.ieee.org/document/10819630/
work_keys_str_mv AT shichaozhou fewshotremotesensingimageryrecognitionwithcompositionalityinductivebiasinhierarchicalrepresentationspace
AT zhuoweiwang fewshotremotesensingimageryrecognitionwithcompositionalityinductivebiasinhierarchicalrepresentationspace
AT zekaizhang fewshotremotesensingimageryrecognitionwithcompositionalityinductivebiasinhierarchicalrepresentationspace
AT wenzhengwang fewshotremotesensingimageryrecognitionwithcompositionalityinductivebiasinhierarchicalrepresentationspace
AT yingruizhao fewshotremotesensingimageryrecognitionwithcompositionalityinductivebiasinhierarchicalrepresentationspace
AT yunpuzhang fewshotremotesensingimageryrecognitionwithcompositionalityinductivebiasinhierarchicalrepresentationspace