Interpreting CNN models for musical instrument recognition using multi-spectrogram heatmap analysis: a preliminary study

IntroductionMusical instrument recognition is a critical component of music information retrieval (MIR), aimed at identifying and classifying instruments from audio recordings. This task poses significant challenges due to the complexity and variability of musical signals.MethodsIn this study, we em...

Full description

Saved in:
Bibliographic Details
Main Authors: Rujia Chen, Akbar Ghobakhlou, Ajit Narayanan
Format: Article
Language:English
Published: Frontiers Media S.A. 2024-12-01
Series:Frontiers in Artificial Intelligence
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/frai.2024.1499913/full
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:IntroductionMusical instrument recognition is a critical component of music information retrieval (MIR), aimed at identifying and classifying instruments from audio recordings. This task poses significant challenges due to the complexity and variability of musical signals.MethodsIn this study, we employed convolutional neural networks (CNNs) to analyze the contributions of various spectrogram representations—STFT, Log-Mel, MFCC, Chroma, Spectral Contrast, and Tonnetz—to the classification of ten different musical instruments. The NSynth database was used for training and evaluation. Visual heatmap analysis and statistical metrics, including Difference Mean, KL Divergence, JS Divergence, and Earth Mover’s Distance, were utilized to assess feature importance and model interpretability.ResultsOur findings highlight the strengths and limitations of each spectrogram type in capturing distinctive features of different instruments. MFCC and Log-Mel spectrograms demonstrated superior performance across most instruments, while others provided insights into specific characteristics.DiscussionThis analysis provides some insights into optimizing spectrogram-based approaches for musical instrument recognition, offering guidance for future model development and improving interpretability through statistical and visual analyses.
ISSN:2624-8212