Testing convolutional neural network based deep learning systems: a statistical metamorphic approach
Machine learning technology spans many areas and today plays a significant role in addressing a wide range of problems in critical domains, i.e., healthcare, autonomous driving, finance, manufacturing, cybersecurity, etc. Metamorphic testing (MT) is considered a simple but very powerful approach in...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
PeerJ Inc.
2025-01-01
|
Series: | PeerJ Computer Science |
Subjects: | |
Online Access: | https://peerj.com/articles/cs-2658.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Machine learning technology spans many areas and today plays a significant role in addressing a wide range of problems in critical domains, i.e., healthcare, autonomous driving, finance, manufacturing, cybersecurity, etc. Metamorphic testing (MT) is considered a simple but very powerful approach in testing such computationally complex systems for which either an oracle is not available or is available but difficult to apply. Conventional metamorphic testing techniques have certain limitations in verifying deep learning-based models (i.e., convolutional neural networks (CNNs)) that have a stochastic nature (because of randomly initializing the network weights) in their training. In this article, we attempt to address this problem by using a statistical metamorphic testing (SMT) technique that does not require software testers to worry about fixing the random seeds (to get deterministic results) to verify the metamorphic relations (MRs). We propose seven MRs combined with different statistical methods to statistically verify whether the program under test adheres to the relation(s) specified in the MR(s). We further use mutation testing techniques to show the usefulness of the proposed approach in the healthcare space and test two CNN-based deep learning models (used for pneumonia detection among patients). The empirical results show that our proposed approach uncovers 85.71% of the implementation faults in the classifiers under test (CUT). Furthermore, we also propose an MRs minimization algorithm for the CUT, thus saving computational costs and organizational testing resources. |
---|---|
ISSN: | 2376-5992 |