Combination of machine learning and Raman spectroscopy for prediction of drug release in targeted drug delivery formulations
Abstract In this research, advanced regression techniques are investigated for modeling intricate release patterns utilizing a high-dimensional dataset comprising more than 1500 spectrum-based variables and categorical inputs. The spectral data are collected from Raman spectroscopy for analysis of d...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-10417-z |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract In this research, advanced regression techniques are investigated for modeling intricate release patterns utilizing a high-dimensional dataset comprising more than 1500 spectrum-based variables and categorical inputs. The spectral data are collected from Raman spectroscopy for analysis of drug release from a solid dosage formulation coated with Polysaccharides (a high-dimensional dataset of 155 samples, with drug release measured at 2, 8, and 24 h). The considered drug is 5-aminosalicylic acid for colonic drug delivery, and its release was estimated using Raman data as inputs along with other categorical parameters. The models, including Kernel Ridge Regression (KRR), Kernel-based Extreme Learning Machine (K-ELM), and Quantile Regression (QR) incorporate sophisticated approaches like the Sailfish Optimizer (SFO) for hyperparameter optimization and K-fold cross-validation to enhance predictive accuracy. Notably, KRR exhibited exceptional performance, achieving an R² of 0.997 on the training set and 0.992 on the test set, with a mean squared error (MSE) of 0.0004. In comparison, K-ELM and QR achieved lower R² values of 0.923 and 0.817 on the test set, respectively. The key innovation lies in integrating these non-linear regression models with robust data preprocessing steps, including dimensionality reduction via Principal Component Analysis (PCA), categorical feature encoding through Leave-One-Out (LOO), and outlier detection using Isolation Forest. This study significantly contributes by offering a comprehensive framework for managing high-dimensional and heterogeneous datasets, while emphasizing the effectiveness of optimization strategies in predictive modeling. By accurately predicting the release of 5-ASA from polysaccharide-coated formulations, these models can aid in the design of targeted colonic delivery formulations with optimized release kinetics, ultimately enhancing the efficacy of treatments for colonic diseases. |
|---|---|
| ISSN: | 2045-2322 |