When Remote Sensing Meets Foundation Model: A Survey and Beyond

Most deep-learning-based vision tasks rely heavily on crowd-labeled data, and a deep neural network (DNN) is usually impacted by the laborious and time-consuming labeling paradigm. Recently, foundation models (FMs) have been presented to learn richer features from multi-modal data. Moreover, a singl...

Full description

Saved in:
Bibliographic Details
Main Authors: Chunlei Huo, Keming Chen, Shuaihao Zhang, Zeyu Wang, Heyu Yan, Jing Shen, Yuyang Hong, Geqi Qi, Hongmei Fang, Zihan Wang
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/17/2/179
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832587551261589504
author Chunlei Huo
Keming Chen
Shuaihao Zhang
Zeyu Wang
Heyu Yan
Jing Shen
Yuyang Hong
Geqi Qi
Hongmei Fang
Zihan Wang
author_facet Chunlei Huo
Keming Chen
Shuaihao Zhang
Zeyu Wang
Heyu Yan
Jing Shen
Yuyang Hong
Geqi Qi
Hongmei Fang
Zihan Wang
author_sort Chunlei Huo
collection DOAJ
description Most deep-learning-based vision tasks rely heavily on crowd-labeled data, and a deep neural network (DNN) is usually impacted by the laborious and time-consuming labeling paradigm. Recently, foundation models (FMs) have been presented to learn richer features from multi-modal data. Moreover, a single foundation model enables zero-shot predictions on various vision tasks. The above advantages make foundation models better suited for remote sensing images, where image annotations are more sparse. However, the inherent differences between natural images and remote sensing images hinder the applications of the foundation model. In this context, this paper provides a comprehensive review of common foundation models and domain-specific foundation models for remote sensing, and it summarizes the latest advances in vision foundation models, textually prompted foundation models, visually prompted foundation models, and heterogeneous foundation models. Despite the great potential of foundation models for vision tasks, open challenges concerning data, model, and task impact the performance of remote sensing images and make foundation models far from practical applications. To address open challenges and reduce the performance gap between natural images and remote sensing images, this paper discusses open challenges and suggests potential directions for future advancements.
format Article
id doaj-art-22a27965aa3349398bce15fac6c9a93e
institution Kabale University
issn 2072-4292
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj-art-22a27965aa3349398bce15fac6c9a93e2025-01-24T13:47:38ZengMDPI AGRemote Sensing2072-42922025-01-0117217910.3390/rs17020179When Remote Sensing Meets Foundation Model: A Survey and BeyondChunlei Huo0Keming Chen1Shuaihao Zhang2Zeyu Wang3Heyu Yan4Jing Shen5Yuyang Hong6Geqi Qi7Hongmei Fang8Zihan Wang9Information Engineering College, Capital Normal University, Beijing 100048, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100086, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100086, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100086, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100086, ChinaXi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, ChinaUniversity of Chinese Academy of Sciences, Beijing 101499, ChinaXi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100086, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100086, ChinaMost deep-learning-based vision tasks rely heavily on crowd-labeled data, and a deep neural network (DNN) is usually impacted by the laborious and time-consuming labeling paradigm. Recently, foundation models (FMs) have been presented to learn richer features from multi-modal data. Moreover, a single foundation model enables zero-shot predictions on various vision tasks. The above advantages make foundation models better suited for remote sensing images, where image annotations are more sparse. However, the inherent differences between natural images and remote sensing images hinder the applications of the foundation model. In this context, this paper provides a comprehensive review of common foundation models and domain-specific foundation models for remote sensing, and it summarizes the latest advances in vision foundation models, textually prompted foundation models, visually prompted foundation models, and heterogeneous foundation models. Despite the great potential of foundation models for vision tasks, open challenges concerning data, model, and task impact the performance of remote sensing images and make foundation models far from practical applications. To address open challenges and reduce the performance gap between natural images and remote sensing images, this paper discusses open challenges and suggests potential directions for future advancements.https://www.mdpi.com/2072-4292/17/2/179foundation modelremote sensingpre-trainingfine-tuningadaptersegment anything model
spellingShingle Chunlei Huo
Keming Chen
Shuaihao Zhang
Zeyu Wang
Heyu Yan
Jing Shen
Yuyang Hong
Geqi Qi
Hongmei Fang
Zihan Wang
When Remote Sensing Meets Foundation Model: A Survey and Beyond
Remote Sensing
foundation model
remote sensing
pre-training
fine-tuning
adapter
segment anything model
title When Remote Sensing Meets Foundation Model: A Survey and Beyond
title_full When Remote Sensing Meets Foundation Model: A Survey and Beyond
title_fullStr When Remote Sensing Meets Foundation Model: A Survey and Beyond
title_full_unstemmed When Remote Sensing Meets Foundation Model: A Survey and Beyond
title_short When Remote Sensing Meets Foundation Model: A Survey and Beyond
title_sort when remote sensing meets foundation model a survey and beyond
topic foundation model
remote sensing
pre-training
fine-tuning
adapter
segment anything model
url https://www.mdpi.com/2072-4292/17/2/179
work_keys_str_mv AT chunleihuo whenremotesensingmeetsfoundationmodelasurveyandbeyond
AT kemingchen whenremotesensingmeetsfoundationmodelasurveyandbeyond
AT shuaihaozhang whenremotesensingmeetsfoundationmodelasurveyandbeyond
AT zeyuwang whenremotesensingmeetsfoundationmodelasurveyandbeyond
AT heyuyan whenremotesensingmeetsfoundationmodelasurveyandbeyond
AT jingshen whenremotesensingmeetsfoundationmodelasurveyandbeyond
AT yuyanghong whenremotesensingmeetsfoundationmodelasurveyandbeyond
AT geqiqi whenremotesensingmeetsfoundationmodelasurveyandbeyond
AT hongmeifang whenremotesensingmeetsfoundationmodelasurveyandbeyond
AT zihanwang whenremotesensingmeetsfoundationmodelasurveyandbeyond