Speech emotion recognition using long-term average spectrum

Automatic speech emotion recognition has become an important research subject in the area of speech signal processing. The performance of classification algorithms depends on the features extracted from speech. In this work, a new framework for emotion recognition is proposed based on the long-term...

Full description

Saved in:
Bibliographic Details
Main Authors: Huerta-Hernández Luis David, Meléndez-Acosta Nayeli Joaquinita, Troncoso-Romero David Ernesto, Ramírez-Pacheco Julio César, León-Borges Jose Antonio
Format: Article
Language:English
Published: De Gruyter 2025-04-01
Series:Open Computer Science
Subjects:
Online Access:https://doi.org/10.1515/comp-2025-0023
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Automatic speech emotion recognition has become an important research subject in the area of speech signal processing. The performance of classification algorithms depends on the features extracted from speech. In this work, a new framework for emotion recognition is proposed based on the long-term average spectrum (LTAS). Our framework is evaluated through a comparative study, where classifiers such as artificial neural network, K-nearest neighbours, logistic regression, Bayesian algorithms, tree-based logistics, and support vector machine were used. The framework was experimentally tested using the well-known Toronto Emotional Speech Set database, and the results were compared against state-of-the-art alternatives, using mel frequency cepstral coefficients, filter bank energies, and chroma coefficient speech coding, on this database. Comparative experiments showed that the use of LTAS achieved higher performance, with accuracies of 96–99% in terms of correct classification of speech emotion, compared with the best performance of 97% for the state-of-the-art alternatives. Different sampling frequencies were used to extract LTAS, and the classifiers were tested individually. The main contribution of this work is to demonstrate that the new framework using LTAS significantly reduces the number of parameters down to 87.5 values per s (approximately), as opposed to the 1,200 values used in the best-performing state-of-the-art alternatives; this means that the process of feature extraction is significantly reduced and the performance in terms of correct classification is improved.
ISSN:2299-1093