Mid-infrared spectra of dried and roasted cocoa (Theobroma cacao L.): A dataset for machine learning-based classification of cocoa varieties and prediction of theobromine and caffeine contentMendeley Data

This paper presents a comprehensive dataset of mid-infrared spectra for dried and roasted cocoa beans (Theobroma cacao L.), along with their corresponding theobromine and caffeine content. Infrared data were acquired using Attenuated Total Reflectance-Fourier Transform Infrared (ATR-FTIR) spectrosco...

Full description

Saved in:
Bibliographic Details
Main Authors: Gentil A. Collazos-Escobar, Andrés F. Bahamón-Monje, Nelson Gutiérrez-Guzmán
Format: Article
Language:English
Published: Elsevier 2025-02-01
Series:Data in Brief
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340924012058
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832576499925909504
author Gentil A. Collazos-Escobar
Andrés F. Bahamón-Monje
Nelson Gutiérrez-Guzmán
author_facet Gentil A. Collazos-Escobar
Andrés F. Bahamón-Monje
Nelson Gutiérrez-Guzmán
author_sort Gentil A. Collazos-Escobar
collection DOAJ
description This paper presents a comprehensive dataset of mid-infrared spectra for dried and roasted cocoa beans (Theobroma cacao L.), along with their corresponding theobromine and caffeine content. Infrared data were acquired using Attenuated Total Reflectance-Fourier Transform Infrared (ATR-FTIR) spectroscopy, while High-Performance Liquid Chromatography (HPLC) was employed to accurately quantify theobromine and caffeine in the dried cocoa beans. The theobromine/caffeine relationship served as a robust chemical marker for distinguishing between different cocoa varieties. This dataset provides a basis for further research, enabling the integration of mid-infrared spectral data with HPLC (as a standard) to fine-tune machine learning and deep learning models that could be used to simultaneously predict the theobromine and caffeine content, as well as cocoa variety in both dried and roasted cocoa samples using a non-destructive approach based on spectral data. The tools developed from this dataset could significantly advance automated processes in the cocoa industry and support decision-making on an industrial scale, facilitating real-time quality control of cocoa-based products, improving cocoa variety classification, and optimizing bean selection, blending strategies, and product formulation, while reducing the need for labor-intensive and costly quantification methods. The dataset is organized into Excel sheets and structured according to experimental conditions and replicates, providing a valuable framework for further analysis, model development, and calibration of multivariate statistical models.
format Article
id doaj-art-af5fda2cba5845019b2f0109acfb176c
institution Kabale University
issn 2352-3409
language English
publishDate 2025-02-01
publisher Elsevier
record_format Article
series Data in Brief
spelling doaj-art-af5fda2cba5845019b2f0109acfb176c2025-01-31T05:11:38ZengElsevierData in Brief2352-34092025-02-0158111243Mid-infrared spectra of dried and roasted cocoa (Theobroma cacao L.): A dataset for machine learning-based classification of cocoa varieties and prediction of theobromine and caffeine contentMendeley DataGentil A. Collazos-Escobar0Andrés F. Bahamón-Monje1Nelson Gutiérrez-Guzmán2Centro Surcolombiano de Investigación en Café (CESURCAFÉ), Departamento de Ingeniería Agrícola, Universidad Surcolombiana, Neiva-Huila 410001, Colombia; Grupo de Análisis y Simulación de Procesos Agroalimentarios (ASPA), Instituto Universitario de Ingeniería de Alimentos–FoodUPV, Universitat Politècnica de València, Camí de Vera s/n, Edificio 3F, València 46022, España; Corresponding author at: Centro Surcolombiano de Investigación en Café (CESURCAFÉ), Departamento de Ingeniería Agrícola, Universidad Surcolombiana, Neiva-Huila 410001, Colombia.Centro Surcolombiano de Investigación en Café (CESURCAFÉ), Departamento de Ingeniería Agrícola, Universidad Surcolombiana, Neiva-Huila 410001, ColombiaCentro Surcolombiano de Investigación en Café (CESURCAFÉ), Departamento de Ingeniería Agrícola, Universidad Surcolombiana, Neiva-Huila 410001, ColombiaThis paper presents a comprehensive dataset of mid-infrared spectra for dried and roasted cocoa beans (Theobroma cacao L.), along with their corresponding theobromine and caffeine content. Infrared data were acquired using Attenuated Total Reflectance-Fourier Transform Infrared (ATR-FTIR) spectroscopy, while High-Performance Liquid Chromatography (HPLC) was employed to accurately quantify theobromine and caffeine in the dried cocoa beans. The theobromine/caffeine relationship served as a robust chemical marker for distinguishing between different cocoa varieties. This dataset provides a basis for further research, enabling the integration of mid-infrared spectral data with HPLC (as a standard) to fine-tune machine learning and deep learning models that could be used to simultaneously predict the theobromine and caffeine content, as well as cocoa variety in both dried and roasted cocoa samples using a non-destructive approach based on spectral data. The tools developed from this dataset could significantly advance automated processes in the cocoa industry and support decision-making on an industrial scale, facilitating real-time quality control of cocoa-based products, improving cocoa variety classification, and optimizing bean selection, blending strategies, and product formulation, while reducing the need for labor-intensive and costly quantification methods. The dataset is organized into Excel sheets and structured according to experimental conditions and replicates, providing a valuable framework for further analysis, model development, and calibration of multivariate statistical models.http://www.sciencedirect.com/science/article/pii/S2352340924012058Functional groupsCocoa quality monitoringMultivariate statistical toolsArtificial intelligenceReal-time decision-making
spellingShingle Gentil A. Collazos-Escobar
Andrés F. Bahamón-Monje
Nelson Gutiérrez-Guzmán
Mid-infrared spectra of dried and roasted cocoa (Theobroma cacao L.): A dataset for machine learning-based classification of cocoa varieties and prediction of theobromine and caffeine contentMendeley Data
Data in Brief
Functional groups
Cocoa quality monitoring
Multivariate statistical tools
Artificial intelligence
Real-time decision-making
title Mid-infrared spectra of dried and roasted cocoa (Theobroma cacao L.): A dataset for machine learning-based classification of cocoa varieties and prediction of theobromine and caffeine contentMendeley Data
title_full Mid-infrared spectra of dried and roasted cocoa (Theobroma cacao L.): A dataset for machine learning-based classification of cocoa varieties and prediction of theobromine and caffeine contentMendeley Data
title_fullStr Mid-infrared spectra of dried and roasted cocoa (Theobroma cacao L.): A dataset for machine learning-based classification of cocoa varieties and prediction of theobromine and caffeine contentMendeley Data
title_full_unstemmed Mid-infrared spectra of dried and roasted cocoa (Theobroma cacao L.): A dataset for machine learning-based classification of cocoa varieties and prediction of theobromine and caffeine contentMendeley Data
title_short Mid-infrared spectra of dried and roasted cocoa (Theobroma cacao L.): A dataset for machine learning-based classification of cocoa varieties and prediction of theobromine and caffeine contentMendeley Data
title_sort mid infrared spectra of dried and roasted cocoa theobroma cacao l a dataset for machine learning based classification of cocoa varieties and prediction of theobromine and caffeine contentmendeley data
topic Functional groups
Cocoa quality monitoring
Multivariate statistical tools
Artificial intelligence
Real-time decision-making
url http://www.sciencedirect.com/science/article/pii/S2352340924012058
work_keys_str_mv AT gentilacollazosescobar midinfraredspectraofdriedandroastedcocoatheobromacacaoladatasetformachinelearningbasedclassificationofcocoavarietiesandpredictionoftheobromineandcaffeinecontentmendeleydata
AT andresfbahamonmonje midinfraredspectraofdriedandroastedcocoatheobromacacaoladatasetformachinelearningbasedclassificationofcocoavarietiesandpredictionoftheobromineandcaffeinecontentmendeleydata
AT nelsongutierrezguzman midinfraredspectraofdriedandroastedcocoatheobromacacaoladatasetformachinelearningbasedclassificationofcocoavarietiesandpredictionoftheobromineandcaffeinecontentmendeleydata