When Remote Sensing Meets Foundation Model: A Survey and Beyond

Most deep-learning-based vision tasks rely heavily on crowd-labeled data, and a deep neural network (DNN) is usually impacted by the laborious and time-consuming labeling paradigm. Recently, foundation models (FMs) have been presented to learn richer features from multi-modal data. Moreover, a singl...

Full description

Saved in:

Bibliographic Details
Main Authors:	Chunlei Huo, Keming Chen, Shuaihao Zhang, Zeyu Wang, Heyu Yan, Jing Shen, Yuyang Hong, Geqi Qi, Hongmei Fang, Zihan Wang
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Remote Sensing
Subjects:	foundation model remote sensing pre-training fine-tuning adapter segment anything model
Online Access:	https://www.mdpi.com/2072-4292/17/2/179
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832587551261589504
author	Chunlei Huo Keming Chen Shuaihao Zhang Zeyu Wang Heyu Yan Jing Shen Yuyang Hong Geqi Qi Hongmei Fang Zihan Wang
author_facet	Chunlei Huo Keming Chen Shuaihao Zhang Zeyu Wang Heyu Yan Jing Shen Yuyang Hong Geqi Qi Hongmei Fang Zihan Wang
author_sort	Chunlei Huo
collection	DOAJ
description	Most deep-learning-based vision tasks rely heavily on crowd-labeled data, and a deep neural network (DNN) is usually impacted by the laborious and time-consuming labeling paradigm. Recently, foundation models (FMs) have been presented to learn richer features from multi-modal data. Moreover, a single foundation model enables zero-shot predictions on various vision tasks. The above advantages make foundation models better suited for remote sensing images, where image annotations are more sparse. However, the inherent differences between natural images and remote sensing images hinder the applications of the foundation model. In this context, this paper provides a comprehensive review of common foundation models and domain-specific foundation models for remote sensing, and it summarizes the latest advances in vision foundation models, textually prompted foundation models, visually prompted foundation models, and heterogeneous foundation models. Despite the great potential of foundation models for vision tasks, open challenges concerning data, model, and task impact the performance of remote sensing images and make foundation models far from practical applications. To address open challenges and reduce the performance gap between natural images and remote sensing images, this paper discusses open challenges and suggests potential directions for future advancements.
format	Article
id	doaj-art-22a27965aa3349398bce15fac6c9a93e
institution	Kabale University
issn	2072-4292
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Remote Sensing
spelling	doaj-art-22a27965aa3349398bce15fac6c9a93e2025-01-24T13:47:38ZengMDPI AGRemote Sensing2072-42922025-01-0117217910.3390/rs17020179When Remote Sensing Meets Foundation Model: A Survey and BeyondChunlei Huo0Keming Chen1Shuaihao Zhang2Zeyu Wang3Heyu Yan4Jing Shen5Yuyang Hong6Geqi Qi7Hongmei Fang8Zihan Wang9Information Engineering College, Capital Normal University, Beijing 100048, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100086, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100086, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100086, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100086, ChinaXi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, ChinaUniversity of Chinese Academy of Sciences, Beijing 101499, ChinaXi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100086, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100086, ChinaMost deep-learning-based vision tasks rely heavily on crowd-labeled data, and a deep neural network (DNN) is usually impacted by the laborious and time-consuming labeling paradigm. Recently, foundation models (FMs) have been presented to learn richer features from multi-modal data. Moreover, a single foundation model enables zero-shot predictions on various vision tasks. The above advantages make foundation models better suited for remote sensing images, where image annotations are more sparse. However, the inherent differences between natural images and remote sensing images hinder the applications of the foundation model. In this context, this paper provides a comprehensive review of common foundation models and domain-specific foundation models for remote sensing, and it summarizes the latest advances in vision foundation models, textually prompted foundation models, visually prompted foundation models, and heterogeneous foundation models. Despite the great potential of foundation models for vision tasks, open challenges concerning data, model, and task impact the performance of remote sensing images and make foundation models far from practical applications. To address open challenges and reduce the performance gap between natural images and remote sensing images, this paper discusses open challenges and suggests potential directions for future advancements.https://www.mdpi.com/2072-4292/17/2/179foundation modelremote sensingpre-trainingfine-tuningadaptersegment anything model
spellingShingle	Chunlei Huo Keming Chen Shuaihao Zhang Zeyu Wang Heyu Yan Jing Shen Yuyang Hong Geqi Qi Hongmei Fang Zihan Wang When Remote Sensing Meets Foundation Model: A Survey and Beyond Remote Sensing foundation model remote sensing pre-training fine-tuning adapter segment anything model
title	When Remote Sensing Meets Foundation Model: A Survey and Beyond
title_full	When Remote Sensing Meets Foundation Model: A Survey and Beyond
title_fullStr	When Remote Sensing Meets Foundation Model: A Survey and Beyond
title_full_unstemmed	When Remote Sensing Meets Foundation Model: A Survey and Beyond
title_short	When Remote Sensing Meets Foundation Model: A Survey and Beyond
title_sort	when remote sensing meets foundation model a survey and beyond
topic	foundation model remote sensing pre-training fine-tuning adapter segment anything model
url	https://www.mdpi.com/2072-4292/17/2/179
work_keys_str_mv	AT chunleihuo whenremotesensingmeetsfoundationmodelasurveyandbeyond AT kemingchen whenremotesensingmeetsfoundationmodelasurveyandbeyond AT shuaihaozhang whenremotesensingmeetsfoundationmodelasurveyandbeyond AT zeyuwang whenremotesensingmeetsfoundationmodelasurveyandbeyond AT heyuyan whenremotesensingmeetsfoundationmodelasurveyandbeyond AT jingshen whenremotesensingmeetsfoundationmodelasurveyandbeyond AT yuyanghong whenremotesensingmeetsfoundationmodelasurveyandbeyond AT geqiqi whenremotesensingmeetsfoundationmodelasurveyandbeyond AT hongmeifang whenremotesensingmeetsfoundationmodelasurveyandbeyond AT zihanwang whenremotesensingmeetsfoundationmodelasurveyandbeyond

When Remote Sensing Meets Foundation Model: A Survey and Beyond

Similar Items