A Novel Stacked Model for Classification of Vocal Cord Paralysis Over Imbalanced Vocal Data
Over time, many classification systems have been developed for voice-related disorders using machine learning methods and limited usage of deep learning techniques. These systems were evaluated across accuracy, F1-score, precision, and recall using the Mel-Frequency Cepstral Coefficient (MFCC), time...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10824769/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832592913238851584 |
---|---|
author | K. Jayashree Hegde K. Manjula Shenoy K. Devaraja |
author_facet | K. Jayashree Hegde K. Manjula Shenoy K. Devaraja |
author_sort | K. Jayashree Hegde |
collection | DOAJ |
description | Over time, many classification systems have been developed for voice-related disorders using machine learning methods and limited usage of deep learning techniques. These systems were evaluated across accuracy, F1-score, precision, and recall using the Mel-Frequency Cepstral Coefficient (MFCC), time-domain features, etc. The aforementioned gave adequate results over the dataset of voice recordings of the vowel that either have a balanced dataset across all the classes or multiple voice pathologies are selected to bring the balance in the dataset equal to healthy subjects. In real-world scenarios, anticipating imbalance and a small amount of data is often associated with voice disorders. Vocal Cord Paralysis is one such voice pathology with limited data. In this paper, the proposed stacked ensemble model, InceptionV3-EfficientNetB0-ViT-B/16, is employed to classify Vocal Cord Paralysis (VCP) and healthy subjects over an imbalanced dataset in hand using spectrograms as a feature. Voice samples from the Saarbruecken Voice Database (SVD) for healthy and VCP are selected of the vowels /a/, /i/, and /u/ over neutral, high, low, and low-high-low pitch conditions and the phrase. Further, using the Short-time Fourier Transform (STFT), the voice samples are preprocessed, and each sample is augmented at various frequencies. The results from the experiments express that the proposed stacked model achieved an excellent accuracy of 94.11% for the vowel /a/ at normal and low-high-low pitch conditions using an imbalanced dataset. In addition, the proposed model’s robustness and trustworthiness are proven by the False Discovery Rate of 0.07142, Cohen Kappa of 0.82105, Mathew’s Correlation coefficient (MCC) of 0.83452, and F1-score 0.91005. The vowels /i/ and /u/, were also evaluated over the proposed model, and 88.23% accuracy is procured over most pitch conditions for the vowels and 90% for the phrase. Overall, the proposed method exhibited a powerful and successful capability for diagnosis throughout an unbalanced dataset without overtly favoring the majority class of healthy individuals and maintained an adequate balance in precisely recognizing the minority class VCP. |
format | Article |
id | doaj-art-29d0b7dd4e834ac0811bba0d647e3753 |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-29d0b7dd4e834ac0811bba0d647e37532025-01-21T00:00:55ZengIEEEIEEE Access2169-35362025-01-0113105591058110.1109/ACCESS.2025.352572110824769A Novel Stacked Model for Classification of Vocal Cord Paralysis Over Imbalanced Vocal DataK. Jayashree Hegde0https://orcid.org/0009-0000-7135-4576K. Manjula Shenoy1https://orcid.org/0000-0002-7835-1156K. Devaraja2https://orcid.org/0000-0001-8171-7393Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, IndiaDepartment of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, IndiaDepartment of Head and Neck Surgery, Manipal Academy of Higher Education, Kasturba Medical College, Manipal, Manipal, Karnataka, IndiaOver time, many classification systems have been developed for voice-related disorders using machine learning methods and limited usage of deep learning techniques. These systems were evaluated across accuracy, F1-score, precision, and recall using the Mel-Frequency Cepstral Coefficient (MFCC), time-domain features, etc. The aforementioned gave adequate results over the dataset of voice recordings of the vowel that either have a balanced dataset across all the classes or multiple voice pathologies are selected to bring the balance in the dataset equal to healthy subjects. In real-world scenarios, anticipating imbalance and a small amount of data is often associated with voice disorders. Vocal Cord Paralysis is one such voice pathology with limited data. In this paper, the proposed stacked ensemble model, InceptionV3-EfficientNetB0-ViT-B/16, is employed to classify Vocal Cord Paralysis (VCP) and healthy subjects over an imbalanced dataset in hand using spectrograms as a feature. Voice samples from the Saarbruecken Voice Database (SVD) for healthy and VCP are selected of the vowels /a/, /i/, and /u/ over neutral, high, low, and low-high-low pitch conditions and the phrase. Further, using the Short-time Fourier Transform (STFT), the voice samples are preprocessed, and each sample is augmented at various frequencies. The results from the experiments express that the proposed stacked model achieved an excellent accuracy of 94.11% for the vowel /a/ at normal and low-high-low pitch conditions using an imbalanced dataset. In addition, the proposed model’s robustness and trustworthiness are proven by the False Discovery Rate of 0.07142, Cohen Kappa of 0.82105, Mathew’s Correlation coefficient (MCC) of 0.83452, and F1-score 0.91005. The vowels /i/ and /u/, were also evaluated over the proposed model, and 88.23% accuracy is procured over most pitch conditions for the vowels and 90% for the phrase. Overall, the proposed method exhibited a powerful and successful capability for diagnosis throughout an unbalanced dataset without overtly favoring the majority class of healthy individuals and maintained an adequate balance in precisely recognizing the minority class VCP.https://ieeexplore.ieee.org/document/10824769/Deep learninghealthcareimbalanced dataspectrogramsstacked generalizationtransfer learning |
spellingShingle | K. Jayashree Hegde K. Manjula Shenoy K. Devaraja A Novel Stacked Model for Classification of Vocal Cord Paralysis Over Imbalanced Vocal Data IEEE Access Deep learning healthcare imbalanced data spectrograms stacked generalization transfer learning |
title | A Novel Stacked Model for Classification of Vocal Cord Paralysis Over Imbalanced Vocal Data |
title_full | A Novel Stacked Model for Classification of Vocal Cord Paralysis Over Imbalanced Vocal Data |
title_fullStr | A Novel Stacked Model for Classification of Vocal Cord Paralysis Over Imbalanced Vocal Data |
title_full_unstemmed | A Novel Stacked Model for Classification of Vocal Cord Paralysis Over Imbalanced Vocal Data |
title_short | A Novel Stacked Model for Classification of Vocal Cord Paralysis Over Imbalanced Vocal Data |
title_sort | novel stacked model for classification of vocal cord paralysis over imbalanced vocal data |
topic | Deep learning healthcare imbalanced data spectrograms stacked generalization transfer learning |
url | https://ieeexplore.ieee.org/document/10824769/ |
work_keys_str_mv | AT kjayashreehegde anovelstackedmodelforclassificationofvocalcordparalysisoverimbalancedvocaldata AT kmanjulashenoy anovelstackedmodelforclassificationofvocalcordparalysisoverimbalancedvocaldata AT kdevaraja anovelstackedmodelforclassificationofvocalcordparalysisoverimbalancedvocaldata AT kjayashreehegde novelstackedmodelforclassificationofvocalcordparalysisoverimbalancedvocaldata AT kmanjulashenoy novelstackedmodelforclassificationofvocalcordparalysisoverimbalancedvocaldata AT kdevaraja novelstackedmodelforclassificationofvocalcordparalysisoverimbalancedvocaldata |