A Survey of Sampling Methods for Hyperspectral Remote Sensing: Addressing Bias Induced by Random Sampling

Identified as early as 2000, the challenges involved in developing and assessing remote sensing models with small datasets remain, with one key issue persisting: the misuse of random sampling to generate training and testing data. This practice often introduces a high degree of correlation between t...

Full description

Saved in:
Bibliographic Details
Main Authors: Kevin T. Decker, Brett J. Borghetti
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/17/8/1373
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850180669904257024
author Kevin T. Decker
Brett J. Borghetti
author_facet Kevin T. Decker
Brett J. Borghetti
author_sort Kevin T. Decker
collection DOAJ
description Identified as early as 2000, the challenges involved in developing and assessing remote sensing models with small datasets remain, with one key issue persisting: the misuse of random sampling to generate training and testing data. This practice often introduces a high degree of correlation between the sets, leading to an overestimation of model generalizability. Despite the early recognition of this problem, few researchers have investigated its nuances or developed effective sampling techniques to address it. Our survey highlights that mitigation strategies to reduce this bias remain underutilized in practice, distorting the interpretation and comparison of results across the field. In this work, we introduce a set of desirable characteristics to evaluate sampling algorithms, with a primary focus on their tendency to induce correlation between training and test data, while also accounting for other relevant factors. Using these characteristics, we survey 146 articles, identify 16 unique sampling algorithms, and evaluate them. Our evaluation reveals two broad archetypes of sampling techniques that effectively mitigate correlation and are suitable for model development.
format Article
id doaj-art-c83d7ecc6c9c438997e0e99ce1ed0ca1
institution OA Journals
issn 2072-4292
language English
publishDate 2025-04-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj-art-c83d7ecc6c9c438997e0e99ce1ed0ca12025-08-20T02:18:04ZengMDPI AGRemote Sensing2072-42922025-04-01178137310.3390/rs17081373A Survey of Sampling Methods for Hyperspectral Remote Sensing: Addressing Bias Induced by Random SamplingKevin T. Decker0Brett J. Borghetti1Air Force Institute of Technology, Department of Electrical and Computer Engineering, 2950 Hobson Way, Wright-Patterson AFB, OH 45433, USAAir Force Institute of Technology, Department of Electrical and Computer Engineering, 2950 Hobson Way, Wright-Patterson AFB, OH 45433, USAIdentified as early as 2000, the challenges involved in developing and assessing remote sensing models with small datasets remain, with one key issue persisting: the misuse of random sampling to generate training and testing data. This practice often introduces a high degree of correlation between the sets, leading to an overestimation of model generalizability. Despite the early recognition of this problem, few researchers have investigated its nuances or developed effective sampling techniques to address it. Our survey highlights that mitigation strategies to reduce this bias remain underutilized in practice, distorting the interpretation and comparison of results across the field. In this work, we introduce a set of desirable characteristics to evaluate sampling algorithms, with a primary focus on their tendency to induce correlation between training and test data, while also accounting for other relevant factors. Using these characteristics, we survey 146 articles, identify 16 unique sampling algorithms, and evaluate them. Our evaluation reveals two broad archetypes of sampling techniques that effectively mitigate correlation and are suitable for model development.https://www.mdpi.com/2072-4292/17/8/1373sampling algorithmgeneralizationmodel assessmentcorrelationremote sensing
spellingShingle Kevin T. Decker
Brett J. Borghetti
A Survey of Sampling Methods for Hyperspectral Remote Sensing: Addressing Bias Induced by Random Sampling
Remote Sensing
sampling algorithm
generalization
model assessment
correlation
remote sensing
title A Survey of Sampling Methods for Hyperspectral Remote Sensing: Addressing Bias Induced by Random Sampling
title_full A Survey of Sampling Methods for Hyperspectral Remote Sensing: Addressing Bias Induced by Random Sampling
title_fullStr A Survey of Sampling Methods for Hyperspectral Remote Sensing: Addressing Bias Induced by Random Sampling
title_full_unstemmed A Survey of Sampling Methods for Hyperspectral Remote Sensing: Addressing Bias Induced by Random Sampling
title_short A Survey of Sampling Methods for Hyperspectral Remote Sensing: Addressing Bias Induced by Random Sampling
title_sort survey of sampling methods for hyperspectral remote sensing addressing bias induced by random sampling
topic sampling algorithm
generalization
model assessment
correlation
remote sensing
url https://www.mdpi.com/2072-4292/17/8/1373
work_keys_str_mv AT kevintdecker asurveyofsamplingmethodsforhyperspectralremotesensingaddressingbiasinducedbyrandomsampling
AT brettjborghetti asurveyofsamplingmethodsforhyperspectralremotesensingaddressingbiasinducedbyrandomsampling
AT kevintdecker surveyofsamplingmethodsforhyperspectralremotesensingaddressingbiasinducedbyrandomsampling
AT brettjborghetti surveyofsamplingmethodsforhyperspectralremotesensingaddressingbiasinducedbyrandomsampling