Exploring Consistent Feature Selection for Software Fault Prediction: An XAI-Based Model-Agnostic Approach
Numerous feature selection (FS) techniques have been widely applied in Software Engineering (SE) to improve the predictive performance of machine learning (ML) models. However, the consistency of these FS techniques, i.e., their ability to select stable features under various data changes, remains u...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10955380/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Numerous feature selection (FS) techniques have been widely applied in Software Engineering (SE) to improve the predictive performance of machine learning (ML) models. However, the consistency of these FS techniques, i.e., their ability to select stable features under various data changes, remains underexplored. While previous studies have examined the stability of traditional FS methods (e.g., Information Gain, Genetic Search), their findings are limited in scope. With the increasing use of eXplainable Artificial Intelligence (XAI) in SE, it is essential to assess the consistency of model-agnostic FS techniques to ensure their reliability in dynamic learning environments. In this study we evaluated the consistency of two prominent XAI-based techniques, Permutation Feature Importance (PFI) and SHapley Additive exPlanations (SHAP), across five ML models: Linear Regression (LR), Multi-layer Perceptron (MLP), Random Forest (RF), Decision Trees (DT), and Support Vector Machines (SVM). Experiments are conducted on six Software Fault Prediction (SFP) datasets using various validation strategies (e.g., 3-fold cross-validation, bootstrap), normalization and dataset modifications. The findings of the study reveal that model-agnostic FS techniques exhibit higher consistency than traditional techniques across all scenarios. Under validation-based changes, SHAP with SVM and DT achieves better average consistency (100%), while MLP records the lowest (74.27%). For PFI, LR, DT, and SVM also reach 100% consistency, with MLP again being the lowest (44.03%). In data modification scenarios, SHAP with MLP shows the highest consistency (76.20%), whereas SVM performs the lowest (70.98%). Using PFI, RF achieves the highest (77.24%), and SVM the lowest (62.84%). Overall, SHAP outperforms PFI across most conditions, particularly under 5-fold CV, bootstrap, and LOO CV, while PFI is more stable when new instances are added to the training set. These findings confirm that both SHAP and PFI offer better consistency than traditional FS techniques, underscoring their reliability for real-world SFP tasks. |
|---|---|
| ISSN: | 2169-3536 |