Spectrogram Features-Based Automatic Speaker Identification For Smart Services

Automatic speaker identification (ASI) is an exciting area of research with numerous applications such as surveillance, voice authentication, identity verification, and electronic voice eavesdropping. This study investigates ASI based on features derived from spectrogram images through a convolution...

Full description

Saved in:
Bibliographic Details
Main Authors: Rashid Jahangir, Mohammed Alreshoodi, Fawaz Khaled Alarfaj
Format: Article
Language:English
Published: Taylor & Francis Group 2025-12-01
Series:Applied Artificial Intelligence
Online Access:https://www.tandfonline.com/doi/10.1080/08839514.2025.2459476
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Automatic speaker identification (ASI) is an exciting area of research with numerous applications such as surveillance, voice authentication, identity verification, and electronic voice eavesdropping. This study investigates ASI based on features derived from spectrogram images through a convolution neural network (CNN) with rectangular-shaped kernels. Traditionally, CNN employs square-shaped kernel and max-pooling operations at different layers, a design optimized to handle 2D data. Nevertheless, encoding of information differs slightly to deal with spectrograms. The frequency is displayed along the y-axis, and the x-axis presents the time of the audio. Amplitude is denoted by intensity within the spectrogram image at certain point. The main contributions of this study are 1: To analyze audio signals effectively using spectrograms, this study proposed the utilization of spectrogram features with different sizes and shapes of rectangular kernels to derive distinctive features by improving the recognition accuracy of the speaker identification system. 2. The extracted spectrogram-based features and models are evaluated on the ELSDSR, TSP, and LibriSpeech datasets and achieved the weighted accuracy of 96.0%, 99.2%, and 97.6%, respectively. 3. The proposed rectangular-shaped CNN approach effectively derives suitable features from spectrogram images and outperformed several baseline techniques when performance was assessed on ELSDSR, TSP, and LibriSpeech datasets.
ISSN:0883-9514
1087-6545