Building a near-infrared (NIR) soil spectral dataset and predictive machine learning models using a handheld NIR spectrophotometerZenodo

This near-infrared spectral dataset consists of 2,106 diverse mineral soil samples scanned, on average, on six different units of the same low-cost commercially available handheld spectrophotometer. Most soil samples were selected from the USDA NRCS National Soil Survey Center-Kellogg Soil Survey La...

Full description

Saved in:
Bibliographic Details
Main Authors: Colleen Partida, Jose Lucas Safanelli, Sadia Mannan Mitu, Mohammad Omar Faruk Murad, Yufeng Ge, Richard Ferguson, Keith Shepherd, Jonathan Sanderman
Format: Article
Language:English
Published: Elsevier 2025-02-01
Series:Data in Brief
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340924011910
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832576509374627840
author Colleen Partida
Jose Lucas Safanelli
Sadia Mannan Mitu
Mohammad Omar Faruk Murad
Yufeng Ge
Richard Ferguson
Keith Shepherd
Jonathan Sanderman
author_facet Colleen Partida
Jose Lucas Safanelli
Sadia Mannan Mitu
Mohammad Omar Faruk Murad
Yufeng Ge
Richard Ferguson
Keith Shepherd
Jonathan Sanderman
author_sort Colleen Partida
collection DOAJ
description This near-infrared spectral dataset consists of 2,106 diverse mineral soil samples scanned, on average, on six different units of the same low-cost commercially available handheld spectrophotometer. Most soil samples were selected from the USDA NRCS National Soil Survey Center-Kellogg Soil Survey Laboratory (NSSC-KSSL) soil archives to represent the diversity of mineral soils (0–30 cm) found in the United States, while 90 samples were selected from Ghana, Kenya, and Nigeria to represent available African soils in the same archive. All scanning was performed on dried and sieved (<2 mm) soil samples. Machine learning predictive models were developed for soil organic carbon (SOC), pH, bulk density (BD), carbonate (CaCO3), exchangeable potassium (Ex. K), sand, silt, and clay content from their spectra in the R programming language using most of this dataset (1,976 US soils) and are included in this data release. Two model types, Cubist and partial least squares regression (PLSR) were developed using two strategies: (1) using an average of the spectral scans across devices for each sample and, (2) using the replicate spectral scans across devices for each sample. We present the internal performance of these models here. The dry spectra and Cubist models for these soil properties are available for download from 10.5281/zenodo.7586621. An example of detailed code used to produce these models is hosted at the Open Soil Spectral Library, a free service of the Soil Spectroscopy for the Global Good Network (soilspectroscopy.org), enabling broad use of these data for multiple soil monitoring applications.
format Article
id doaj-art-d39b0410d7404c4dbd410e96d281ce63
institution Kabale University
issn 2352-3409
language English
publishDate 2025-02-01
publisher Elsevier
record_format Article
series Data in Brief
spelling doaj-art-d39b0410d7404c4dbd410e96d281ce632025-01-31T05:11:34ZengElsevierData in Brief2352-34092025-02-0158111229Building a near-infrared (NIR) soil spectral dataset and predictive machine learning models using a handheld NIR spectrophotometerZenodoColleen Partida0Jose Lucas Safanelli1Sadia Mannan Mitu2Mohammad Omar Faruk Murad3Yufeng Ge4Richard Ferguson5Keith Shepherd6Jonathan Sanderman7Woodwell Climate Research Center, 149 Woods Hole Rd., Falmouth, MA, 02540, United StatesWoodwell Climate Research Center, 149 Woods Hole Rd., Falmouth, MA, 02540, United StatesDepartment of Biological Systems Engineering, University of Nebraska-Lincoln, E Campus Mall, Lincoln, NE, 68583, United StatesDepartment of Biological Systems Engineering, University of Nebraska-Lincoln, E Campus Mall, Lincoln, NE, 68583, United StatesDepartment of Biological Systems Engineering, University of Nebraska-Lincoln, E Campus Mall, Lincoln, NE, 68583, United StatesUSDA, Natural Resources Conservation Service (NRCS), National Soil Survey Center (NSSC), Kellogg Soil Survey Laboratory (KSSL), 1121 Lincoln Mall, Lincoln, NE, 68508, United StatesInnovative Solutions for Decision Agriculture (iSDA), Rothamsted Campus, West Common, Harpendedn AL5 2JQ, UKWoodwell Climate Research Center, 149 Woods Hole Rd., Falmouth, MA, 02540, United States; Corresponding author.This near-infrared spectral dataset consists of 2,106 diverse mineral soil samples scanned, on average, on six different units of the same low-cost commercially available handheld spectrophotometer. Most soil samples were selected from the USDA NRCS National Soil Survey Center-Kellogg Soil Survey Laboratory (NSSC-KSSL) soil archives to represent the diversity of mineral soils (0–30 cm) found in the United States, while 90 samples were selected from Ghana, Kenya, and Nigeria to represent available African soils in the same archive. All scanning was performed on dried and sieved (<2 mm) soil samples. Machine learning predictive models were developed for soil organic carbon (SOC), pH, bulk density (BD), carbonate (CaCO3), exchangeable potassium (Ex. K), sand, silt, and clay content from their spectra in the R programming language using most of this dataset (1,976 US soils) and are included in this data release. Two model types, Cubist and partial least squares regression (PLSR) were developed using two strategies: (1) using an average of the spectral scans across devices for each sample and, (2) using the replicate spectral scans across devices for each sample. We present the internal performance of these models here. The dry spectra and Cubist models for these soil properties are available for download from 10.5281/zenodo.7586621. An example of detailed code used to produce these models is hosted at the Open Soil Spectral Library, a free service of the Soil Spectroscopy for the Global Good Network (soilspectroscopy.org), enabling broad use of these data for multiple soil monitoring applications.http://www.sciencedirect.com/science/article/pii/S2352340924011910Soil spectroscopySoil organic carbonPedometricsChemometricsSoil analysis
spellingShingle Colleen Partida
Jose Lucas Safanelli
Sadia Mannan Mitu
Mohammad Omar Faruk Murad
Yufeng Ge
Richard Ferguson
Keith Shepherd
Jonathan Sanderman
Building a near-infrared (NIR) soil spectral dataset and predictive machine learning models using a handheld NIR spectrophotometerZenodo
Data in Brief
Soil spectroscopy
Soil organic carbon
Pedometrics
Chemometrics
Soil analysis
title Building a near-infrared (NIR) soil spectral dataset and predictive machine learning models using a handheld NIR spectrophotometerZenodo
title_full Building a near-infrared (NIR) soil spectral dataset and predictive machine learning models using a handheld NIR spectrophotometerZenodo
title_fullStr Building a near-infrared (NIR) soil spectral dataset and predictive machine learning models using a handheld NIR spectrophotometerZenodo
title_full_unstemmed Building a near-infrared (NIR) soil spectral dataset and predictive machine learning models using a handheld NIR spectrophotometerZenodo
title_short Building a near-infrared (NIR) soil spectral dataset and predictive machine learning models using a handheld NIR spectrophotometerZenodo
title_sort building a near infrared nir soil spectral dataset and predictive machine learning models using a handheld nir spectrophotometerzenodo
topic Soil spectroscopy
Soil organic carbon
Pedometrics
Chemometrics
Soil analysis
url http://www.sciencedirect.com/science/article/pii/S2352340924011910
work_keys_str_mv AT colleenpartida buildinganearinfrarednirsoilspectraldatasetandpredictivemachinelearningmodelsusingahandheldnirspectrophotometerzenodo
AT joselucassafanelli buildinganearinfrarednirsoilspectraldatasetandpredictivemachinelearningmodelsusingahandheldnirspectrophotometerzenodo
AT sadiamannanmitu buildinganearinfrarednirsoilspectraldatasetandpredictivemachinelearningmodelsusingahandheldnirspectrophotometerzenodo
AT mohammadomarfarukmurad buildinganearinfrarednirsoilspectraldatasetandpredictivemachinelearningmodelsusingahandheldnirspectrophotometerzenodo
AT yufengge buildinganearinfrarednirsoilspectraldatasetandpredictivemachinelearningmodelsusingahandheldnirspectrophotometerzenodo
AT richardferguson buildinganearinfrarednirsoilspectraldatasetandpredictivemachinelearningmodelsusingahandheldnirspectrophotometerzenodo
AT keithshepherd buildinganearinfrarednirsoilspectraldatasetandpredictivemachinelearningmodelsusingahandheldnirspectrophotometerzenodo
AT jonathansanderman buildinganearinfrarednirsoilspectraldatasetandpredictivemachinelearningmodelsusingahandheldnirspectrophotometerzenodo