Mid-infrared spectra of dried and roasted cocoa (Theobroma cacao L.): A dataset for machine learning-based classification of cocoa varieties and prediction of theobromine and caffeine contentMendeley Data
This paper presents a comprehensive dataset of mid-infrared spectra for dried and roasted cocoa beans (Theobroma cacao L.), along with their corresponding theobromine and caffeine content. Infrared data were acquired using Attenuated Total Reflectance-Fourier Transform Infrared (ATR-FTIR) spectrosco...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-02-01
|
Series: | Data in Brief |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2352340924012058 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832576499925909504 |
---|---|
author | Gentil A. Collazos-Escobar Andrés F. Bahamón-Monje Nelson Gutiérrez-Guzmán |
author_facet | Gentil A. Collazos-Escobar Andrés F. Bahamón-Monje Nelson Gutiérrez-Guzmán |
author_sort | Gentil A. Collazos-Escobar |
collection | DOAJ |
description | This paper presents a comprehensive dataset of mid-infrared spectra for dried and roasted cocoa beans (Theobroma cacao L.), along with their corresponding theobromine and caffeine content. Infrared data were acquired using Attenuated Total Reflectance-Fourier Transform Infrared (ATR-FTIR) spectroscopy, while High-Performance Liquid Chromatography (HPLC) was employed to accurately quantify theobromine and caffeine in the dried cocoa beans. The theobromine/caffeine relationship served as a robust chemical marker for distinguishing between different cocoa varieties. This dataset provides a basis for further research, enabling the integration of mid-infrared spectral data with HPLC (as a standard) to fine-tune machine learning and deep learning models that could be used to simultaneously predict the theobromine and caffeine content, as well as cocoa variety in both dried and roasted cocoa samples using a non-destructive approach based on spectral data. The tools developed from this dataset could significantly advance automated processes in the cocoa industry and support decision-making on an industrial scale, facilitating real-time quality control of cocoa-based products, improving cocoa variety classification, and optimizing bean selection, blending strategies, and product formulation, while reducing the need for labor-intensive and costly quantification methods. The dataset is organized into Excel sheets and structured according to experimental conditions and replicates, providing a valuable framework for further analysis, model development, and calibration of multivariate statistical models. |
format | Article |
id | doaj-art-af5fda2cba5845019b2f0109acfb176c |
institution | Kabale University |
issn | 2352-3409 |
language | English |
publishDate | 2025-02-01 |
publisher | Elsevier |
record_format | Article |
series | Data in Brief |
spelling | doaj-art-af5fda2cba5845019b2f0109acfb176c2025-01-31T05:11:38ZengElsevierData in Brief2352-34092025-02-0158111243Mid-infrared spectra of dried and roasted cocoa (Theobroma cacao L.): A dataset for machine learning-based classification of cocoa varieties and prediction of theobromine and caffeine contentMendeley DataGentil A. Collazos-Escobar0Andrés F. Bahamón-Monje1Nelson Gutiérrez-Guzmán2Centro Surcolombiano de Investigación en Café (CESURCAFÉ), Departamento de Ingeniería Agrícola, Universidad Surcolombiana, Neiva-Huila 410001, Colombia; Grupo de Análisis y Simulación de Procesos Agroalimentarios (ASPA), Instituto Universitario de Ingeniería de Alimentos–FoodUPV, Universitat Politècnica de València, Camí de Vera s/n, Edificio 3F, València 46022, España; Corresponding author at: Centro Surcolombiano de Investigación en Café (CESURCAFÉ), Departamento de Ingeniería Agrícola, Universidad Surcolombiana, Neiva-Huila 410001, Colombia.Centro Surcolombiano de Investigación en Café (CESURCAFÉ), Departamento de Ingeniería Agrícola, Universidad Surcolombiana, Neiva-Huila 410001, ColombiaCentro Surcolombiano de Investigación en Café (CESURCAFÉ), Departamento de Ingeniería Agrícola, Universidad Surcolombiana, Neiva-Huila 410001, ColombiaThis paper presents a comprehensive dataset of mid-infrared spectra for dried and roasted cocoa beans (Theobroma cacao L.), along with their corresponding theobromine and caffeine content. Infrared data were acquired using Attenuated Total Reflectance-Fourier Transform Infrared (ATR-FTIR) spectroscopy, while High-Performance Liquid Chromatography (HPLC) was employed to accurately quantify theobromine and caffeine in the dried cocoa beans. The theobromine/caffeine relationship served as a robust chemical marker for distinguishing between different cocoa varieties. This dataset provides a basis for further research, enabling the integration of mid-infrared spectral data with HPLC (as a standard) to fine-tune machine learning and deep learning models that could be used to simultaneously predict the theobromine and caffeine content, as well as cocoa variety in both dried and roasted cocoa samples using a non-destructive approach based on spectral data. The tools developed from this dataset could significantly advance automated processes in the cocoa industry and support decision-making on an industrial scale, facilitating real-time quality control of cocoa-based products, improving cocoa variety classification, and optimizing bean selection, blending strategies, and product formulation, while reducing the need for labor-intensive and costly quantification methods. The dataset is organized into Excel sheets and structured according to experimental conditions and replicates, providing a valuable framework for further analysis, model development, and calibration of multivariate statistical models.http://www.sciencedirect.com/science/article/pii/S2352340924012058Functional groupsCocoa quality monitoringMultivariate statistical toolsArtificial intelligenceReal-time decision-making |
spellingShingle | Gentil A. Collazos-Escobar Andrés F. Bahamón-Monje Nelson Gutiérrez-Guzmán Mid-infrared spectra of dried and roasted cocoa (Theobroma cacao L.): A dataset for machine learning-based classification of cocoa varieties and prediction of theobromine and caffeine contentMendeley Data Data in Brief Functional groups Cocoa quality monitoring Multivariate statistical tools Artificial intelligence Real-time decision-making |
title | Mid-infrared spectra of dried and roasted cocoa (Theobroma cacao L.): A dataset for machine learning-based classification of cocoa varieties and prediction of theobromine and caffeine contentMendeley Data |
title_full | Mid-infrared spectra of dried and roasted cocoa (Theobroma cacao L.): A dataset for machine learning-based classification of cocoa varieties and prediction of theobromine and caffeine contentMendeley Data |
title_fullStr | Mid-infrared spectra of dried and roasted cocoa (Theobroma cacao L.): A dataset for machine learning-based classification of cocoa varieties and prediction of theobromine and caffeine contentMendeley Data |
title_full_unstemmed | Mid-infrared spectra of dried and roasted cocoa (Theobroma cacao L.): A dataset for machine learning-based classification of cocoa varieties and prediction of theobromine and caffeine contentMendeley Data |
title_short | Mid-infrared spectra of dried and roasted cocoa (Theobroma cacao L.): A dataset for machine learning-based classification of cocoa varieties and prediction of theobromine and caffeine contentMendeley Data |
title_sort | mid infrared spectra of dried and roasted cocoa theobroma cacao l a dataset for machine learning based classification of cocoa varieties and prediction of theobromine and caffeine contentmendeley data |
topic | Functional groups Cocoa quality monitoring Multivariate statistical tools Artificial intelligence Real-time decision-making |
url | http://www.sciencedirect.com/science/article/pii/S2352340924012058 |
work_keys_str_mv | AT gentilacollazosescobar midinfraredspectraofdriedandroastedcocoatheobromacacaoladatasetformachinelearningbasedclassificationofcocoavarietiesandpredictionoftheobromineandcaffeinecontentmendeleydata AT andresfbahamonmonje midinfraredspectraofdriedandroastedcocoatheobromacacaoladatasetformachinelearningbasedclassificationofcocoavarietiesandpredictionoftheobromineandcaffeinecontentmendeleydata AT nelsongutierrezguzman midinfraredspectraofdriedandroastedcocoatheobromacacaoladatasetformachinelearningbasedclassificationofcocoavarietiesandpredictionoftheobromineandcaffeinecontentmendeleydata |