A Novel Stacked Model for Classification of Vocal Cord Paralysis Over Imbalanced Vocal Data

Over time, many classification systems have been developed for voice-related disorders using machine learning methods and limited usage of deep learning techniques. These systems were evaluated across accuracy, F1-score, precision, and recall using the Mel-Frequency Cepstral Coefficient (MFCC), time...

Full description

Saved in:
Bibliographic Details
Main Authors: K. Jayashree Hegde, K. Manjula Shenoy, K. Devaraja
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10824769/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832592913238851584
author K. Jayashree Hegde
K. Manjula Shenoy
K. Devaraja
author_facet K. Jayashree Hegde
K. Manjula Shenoy
K. Devaraja
author_sort K. Jayashree Hegde
collection DOAJ
description Over time, many classification systems have been developed for voice-related disorders using machine learning methods and limited usage of deep learning techniques. These systems were evaluated across accuracy, F1-score, precision, and recall using the Mel-Frequency Cepstral Coefficient (MFCC), time-domain features, etc. The aforementioned gave adequate results over the dataset of voice recordings of the vowel that either have a balanced dataset across all the classes or multiple voice pathologies are selected to bring the balance in the dataset equal to healthy subjects. In real-world scenarios, anticipating imbalance and a small amount of data is often associated with voice disorders. Vocal Cord Paralysis is one such voice pathology with limited data. In this paper, the proposed stacked ensemble model, InceptionV3-EfficientNetB0-ViT-B/16, is employed to classify Vocal Cord Paralysis (VCP) and healthy subjects over an imbalanced dataset in hand using spectrograms as a feature. Voice samples from the Saarbruecken Voice Database (SVD) for healthy and VCP are selected of the vowels /a/, /i/, and /u/ over neutral, high, low, and low-high-low pitch conditions and the phrase. Further, using the Short-time Fourier Transform (STFT), the voice samples are preprocessed, and each sample is augmented at various frequencies. The results from the experiments express that the proposed stacked model achieved an excellent accuracy of 94.11% for the vowel /a/ at normal and low-high-low pitch conditions using an imbalanced dataset. In addition, the proposed model’s robustness and trustworthiness are proven by the False Discovery Rate of 0.07142, Cohen Kappa of 0.82105, Mathew’s Correlation coefficient (MCC) of 0.83452, and F1-score 0.91005. The vowels /i/ and /u/, were also evaluated over the proposed model, and 88.23% accuracy is procured over most pitch conditions for the vowels and 90% for the phrase. Overall, the proposed method exhibited a powerful and successful capability for diagnosis throughout an unbalanced dataset without overtly favoring the majority class of healthy individuals and maintained an adequate balance in precisely recognizing the minority class VCP.
format Article
id doaj-art-29d0b7dd4e834ac0811bba0d647e3753
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-29d0b7dd4e834ac0811bba0d647e37532025-01-21T00:00:55ZengIEEEIEEE Access2169-35362025-01-0113105591058110.1109/ACCESS.2025.352572110824769A Novel Stacked Model for Classification of Vocal Cord Paralysis Over Imbalanced Vocal DataK. Jayashree Hegde0https://orcid.org/0009-0000-7135-4576K. Manjula Shenoy1https://orcid.org/0000-0002-7835-1156K. Devaraja2https://orcid.org/0000-0001-8171-7393Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, IndiaDepartment of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, IndiaDepartment of Head and Neck Surgery, Manipal Academy of Higher Education, Kasturba Medical College, Manipal, Manipal, Karnataka, IndiaOver time, many classification systems have been developed for voice-related disorders using machine learning methods and limited usage of deep learning techniques. These systems were evaluated across accuracy, F1-score, precision, and recall using the Mel-Frequency Cepstral Coefficient (MFCC), time-domain features, etc. The aforementioned gave adequate results over the dataset of voice recordings of the vowel that either have a balanced dataset across all the classes or multiple voice pathologies are selected to bring the balance in the dataset equal to healthy subjects. In real-world scenarios, anticipating imbalance and a small amount of data is often associated with voice disorders. Vocal Cord Paralysis is one such voice pathology with limited data. In this paper, the proposed stacked ensemble model, InceptionV3-EfficientNetB0-ViT-B/16, is employed to classify Vocal Cord Paralysis (VCP) and healthy subjects over an imbalanced dataset in hand using spectrograms as a feature. Voice samples from the Saarbruecken Voice Database (SVD) for healthy and VCP are selected of the vowels /a/, /i/, and /u/ over neutral, high, low, and low-high-low pitch conditions and the phrase. Further, using the Short-time Fourier Transform (STFT), the voice samples are preprocessed, and each sample is augmented at various frequencies. The results from the experiments express that the proposed stacked model achieved an excellent accuracy of 94.11% for the vowel /a/ at normal and low-high-low pitch conditions using an imbalanced dataset. In addition, the proposed model’s robustness and trustworthiness are proven by the False Discovery Rate of 0.07142, Cohen Kappa of 0.82105, Mathew’s Correlation coefficient (MCC) of 0.83452, and F1-score 0.91005. The vowels /i/ and /u/, were also evaluated over the proposed model, and 88.23% accuracy is procured over most pitch conditions for the vowels and 90% for the phrase. Overall, the proposed method exhibited a powerful and successful capability for diagnosis throughout an unbalanced dataset without overtly favoring the majority class of healthy individuals and maintained an adequate balance in precisely recognizing the minority class VCP.https://ieeexplore.ieee.org/document/10824769/Deep learninghealthcareimbalanced dataspectrogramsstacked generalizationtransfer learning
spellingShingle K. Jayashree Hegde
K. Manjula Shenoy
K. Devaraja
A Novel Stacked Model for Classification of Vocal Cord Paralysis Over Imbalanced Vocal Data
IEEE Access
Deep learning
healthcare
imbalanced data
spectrograms
stacked generalization
transfer learning
title A Novel Stacked Model for Classification of Vocal Cord Paralysis Over Imbalanced Vocal Data
title_full A Novel Stacked Model for Classification of Vocal Cord Paralysis Over Imbalanced Vocal Data
title_fullStr A Novel Stacked Model for Classification of Vocal Cord Paralysis Over Imbalanced Vocal Data
title_full_unstemmed A Novel Stacked Model for Classification of Vocal Cord Paralysis Over Imbalanced Vocal Data
title_short A Novel Stacked Model for Classification of Vocal Cord Paralysis Over Imbalanced Vocal Data
title_sort novel stacked model for classification of vocal cord paralysis over imbalanced vocal data
topic Deep learning
healthcare
imbalanced data
spectrograms
stacked generalization
transfer learning
url https://ieeexplore.ieee.org/document/10824769/
work_keys_str_mv AT kjayashreehegde anovelstackedmodelforclassificationofvocalcordparalysisoverimbalancedvocaldata
AT kmanjulashenoy anovelstackedmodelforclassificationofvocalcordparalysisoverimbalancedvocaldata
AT kdevaraja anovelstackedmodelforclassificationofvocalcordparalysisoverimbalancedvocaldata
AT kjayashreehegde novelstackedmodelforclassificationofvocalcordparalysisoverimbalancedvocaldata
AT kmanjulashenoy novelstackedmodelforclassificationofvocalcordparalysisoverimbalancedvocaldata
AT kdevaraja novelstackedmodelforclassificationofvocalcordparalysisoverimbalancedvocaldata