When Remote Sensing Meets Foundation Model: A Survey and Beyond
Most deep-learning-based vision tasks rely heavily on crowd-labeled data, and a deep neural network (DNN) is usually impacted by the laborious and time-consuming labeling paradigm. Recently, foundation models (FMs) have been presented to learn richer features from multi-modal data. Moreover, a singl...
Saved in:
Main Authors: | , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-01-01
|
Series: | Remote Sensing |
Subjects: | |
Online Access: | https://www.mdpi.com/2072-4292/17/2/179 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832587551261589504 |
---|---|
author | Chunlei Huo Keming Chen Shuaihao Zhang Zeyu Wang Heyu Yan Jing Shen Yuyang Hong Geqi Qi Hongmei Fang Zihan Wang |
author_facet | Chunlei Huo Keming Chen Shuaihao Zhang Zeyu Wang Heyu Yan Jing Shen Yuyang Hong Geqi Qi Hongmei Fang Zihan Wang |
author_sort | Chunlei Huo |
collection | DOAJ |
description | Most deep-learning-based vision tasks rely heavily on crowd-labeled data, and a deep neural network (DNN) is usually impacted by the laborious and time-consuming labeling paradigm. Recently, foundation models (FMs) have been presented to learn richer features from multi-modal data. Moreover, a single foundation model enables zero-shot predictions on various vision tasks. The above advantages make foundation models better suited for remote sensing images, where image annotations are more sparse. However, the inherent differences between natural images and remote sensing images hinder the applications of the foundation model. In this context, this paper provides a comprehensive review of common foundation models and domain-specific foundation models for remote sensing, and it summarizes the latest advances in vision foundation models, textually prompted foundation models, visually prompted foundation models, and heterogeneous foundation models. Despite the great potential of foundation models for vision tasks, open challenges concerning data, model, and task impact the performance of remote sensing images and make foundation models far from practical applications. To address open challenges and reduce the performance gap between natural images and remote sensing images, this paper discusses open challenges and suggests potential directions for future advancements. |
format | Article |
id | doaj-art-22a27965aa3349398bce15fac6c9a93e |
institution | Kabale University |
issn | 2072-4292 |
language | English |
publishDate | 2025-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Remote Sensing |
spelling | doaj-art-22a27965aa3349398bce15fac6c9a93e2025-01-24T13:47:38ZengMDPI AGRemote Sensing2072-42922025-01-0117217910.3390/rs17020179When Remote Sensing Meets Foundation Model: A Survey and BeyondChunlei Huo0Keming Chen1Shuaihao Zhang2Zeyu Wang3Heyu Yan4Jing Shen5Yuyang Hong6Geqi Qi7Hongmei Fang8Zihan Wang9Information Engineering College, Capital Normal University, Beijing 100048, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100086, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100086, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100086, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100086, ChinaXi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, ChinaUniversity of Chinese Academy of Sciences, Beijing 101499, ChinaXi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100086, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100086, ChinaMost deep-learning-based vision tasks rely heavily on crowd-labeled data, and a deep neural network (DNN) is usually impacted by the laborious and time-consuming labeling paradigm. Recently, foundation models (FMs) have been presented to learn richer features from multi-modal data. Moreover, a single foundation model enables zero-shot predictions on various vision tasks. The above advantages make foundation models better suited for remote sensing images, where image annotations are more sparse. However, the inherent differences between natural images and remote sensing images hinder the applications of the foundation model. In this context, this paper provides a comprehensive review of common foundation models and domain-specific foundation models for remote sensing, and it summarizes the latest advances in vision foundation models, textually prompted foundation models, visually prompted foundation models, and heterogeneous foundation models. Despite the great potential of foundation models for vision tasks, open challenges concerning data, model, and task impact the performance of remote sensing images and make foundation models far from practical applications. To address open challenges and reduce the performance gap between natural images and remote sensing images, this paper discusses open challenges and suggests potential directions for future advancements.https://www.mdpi.com/2072-4292/17/2/179foundation modelremote sensingpre-trainingfine-tuningadaptersegment anything model |
spellingShingle | Chunlei Huo Keming Chen Shuaihao Zhang Zeyu Wang Heyu Yan Jing Shen Yuyang Hong Geqi Qi Hongmei Fang Zihan Wang When Remote Sensing Meets Foundation Model: A Survey and Beyond Remote Sensing foundation model remote sensing pre-training fine-tuning adapter segment anything model |
title | When Remote Sensing Meets Foundation Model: A Survey and Beyond |
title_full | When Remote Sensing Meets Foundation Model: A Survey and Beyond |
title_fullStr | When Remote Sensing Meets Foundation Model: A Survey and Beyond |
title_full_unstemmed | When Remote Sensing Meets Foundation Model: A Survey and Beyond |
title_short | When Remote Sensing Meets Foundation Model: A Survey and Beyond |
title_sort | when remote sensing meets foundation model a survey and beyond |
topic | foundation model remote sensing pre-training fine-tuning adapter segment anything model |
url | https://www.mdpi.com/2072-4292/17/2/179 |
work_keys_str_mv | AT chunleihuo whenremotesensingmeetsfoundationmodelasurveyandbeyond AT kemingchen whenremotesensingmeetsfoundationmodelasurveyandbeyond AT shuaihaozhang whenremotesensingmeetsfoundationmodelasurveyandbeyond AT zeyuwang whenremotesensingmeetsfoundationmodelasurveyandbeyond AT heyuyan whenremotesensingmeetsfoundationmodelasurveyandbeyond AT jingshen whenremotesensingmeetsfoundationmodelasurveyandbeyond AT yuyanghong whenremotesensingmeetsfoundationmodelasurveyandbeyond AT geqiqi whenremotesensingmeetsfoundationmodelasurveyandbeyond AT hongmeifang whenremotesensingmeetsfoundationmodelasurveyandbeyond AT zihanwang whenremotesensingmeetsfoundationmodelasurveyandbeyond |