IC-GraF: An Improved Clustering with Graph-Embedding-Based Features for Software Defect Prediction

Software defect prediction (SDP) has been a prominent area of research in software engineering. Previous SDP methods often struggled in industrial applications, primarily due to the need for sufficient historical data. Thus, clustering-based unsupervised defect prediction (CUDP) and cross-project de...

Full description

Saved in:
Bibliographic Details
Main Authors: Xuanye Wang, Lu Lu, Qingyan Tian, Haishan Lin
Format: Article
Language:English
Published: Wiley 2024-01-01
Series:IET Software
Online Access:http://dx.doi.org/10.1049/2024/8027037
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832568851627245568
author Xuanye Wang
Lu Lu
Qingyan Tian
Haishan Lin
author_facet Xuanye Wang
Lu Lu
Qingyan Tian
Haishan Lin
author_sort Xuanye Wang
collection DOAJ
description Software defect prediction (SDP) has been a prominent area of research in software engineering. Previous SDP methods often struggled in industrial applications, primarily due to the need for sufficient historical data. Thus, clustering-based unsupervised defect prediction (CUDP) and cross-project defect prediction (CPDP) emerged to address this challenge. However, the former exhibited limitations in capturing semantic and structural features, while the latter encountered constraints due to differences in data distribution across projects. Therefore, we introduce a novel framework called improved clustering with graph-embedding-based features (IC-GraF) for SDP without the reliance on historical data. First, a preprocessing operation is performed to extract program dependence graphs (PDGs) and mark distinct dependency relationships within them. Second, the improved deep graph infomax (IDGI) model, an extension of the DGI model specifically for SDP, is designed to generate graph-level representations of PDGs. Finally, a heuristic-based k-means clustering algorithm is employed to classify the features generated by IDGI. To validate the efficacy of IC-GraF, we conduct experiments based on 24 releases of the PROMISE dataset, using F-measure and G-measure as evaluation criteria. The findings indicate that IC-GraF achieves 5.0%−42.7% higher F-measure, 5%−39.4% higher G-measure, and 2.5%−11.4% higher AUC over existing CUDP methods. Even when compared with eight supervised learning-based SDP methods, IC-GraF maintains a superior competitive edge.
format Article
id doaj-art-7f6140e09f5b4919aab667d60254d1fb
institution Kabale University
issn 1751-8814
language English
publishDate 2024-01-01
publisher Wiley
record_format Article
series IET Software
spelling doaj-art-7f6140e09f5b4919aab667d60254d1fb2025-02-03T00:10:09ZengWileyIET Software1751-88142024-01-01202410.1049/2024/8027037IC-GraF: An Improved Clustering with Graph-Embedding-Based Features for Software Defect PredictionXuanye Wang0Lu Lu1Qingyan Tian2Haishan Lin3School of Computer Science and EngineeringSchool of Computer Science and EngineeringGuangdong Provincial Key Laboratory of Tunnel Safety and Emergency Support Technology and EquipmentGuangdong Provincial Key Laboratory of Tunnel Safety and Emergency Support Technology and EquipmentSoftware defect prediction (SDP) has been a prominent area of research in software engineering. Previous SDP methods often struggled in industrial applications, primarily due to the need for sufficient historical data. Thus, clustering-based unsupervised defect prediction (CUDP) and cross-project defect prediction (CPDP) emerged to address this challenge. However, the former exhibited limitations in capturing semantic and structural features, while the latter encountered constraints due to differences in data distribution across projects. Therefore, we introduce a novel framework called improved clustering with graph-embedding-based features (IC-GraF) for SDP without the reliance on historical data. First, a preprocessing operation is performed to extract program dependence graphs (PDGs) and mark distinct dependency relationships within them. Second, the improved deep graph infomax (IDGI) model, an extension of the DGI model specifically for SDP, is designed to generate graph-level representations of PDGs. Finally, a heuristic-based k-means clustering algorithm is employed to classify the features generated by IDGI. To validate the efficacy of IC-GraF, we conduct experiments based on 24 releases of the PROMISE dataset, using F-measure and G-measure as evaluation criteria. The findings indicate that IC-GraF achieves 5.0%−42.7% higher F-measure, 5%−39.4% higher G-measure, and 2.5%−11.4% higher AUC over existing CUDP methods. Even when compared with eight supervised learning-based SDP methods, IC-GraF maintains a superior competitive edge.http://dx.doi.org/10.1049/2024/8027037
spellingShingle Xuanye Wang
Lu Lu
Qingyan Tian
Haishan Lin
IC-GraF: An Improved Clustering with Graph-Embedding-Based Features for Software Defect Prediction
IET Software
title IC-GraF: An Improved Clustering with Graph-Embedding-Based Features for Software Defect Prediction
title_full IC-GraF: An Improved Clustering with Graph-Embedding-Based Features for Software Defect Prediction
title_fullStr IC-GraF: An Improved Clustering with Graph-Embedding-Based Features for Software Defect Prediction
title_full_unstemmed IC-GraF: An Improved Clustering with Graph-Embedding-Based Features for Software Defect Prediction
title_short IC-GraF: An Improved Clustering with Graph-Embedding-Based Features for Software Defect Prediction
title_sort ic graf an improved clustering with graph embedding based features for software defect prediction
url http://dx.doi.org/10.1049/2024/8027037
work_keys_str_mv AT xuanyewang icgrafanimprovedclusteringwithgraphembeddingbasedfeaturesforsoftwaredefectprediction
AT lulu icgrafanimprovedclusteringwithgraphembeddingbasedfeaturesforsoftwaredefectprediction
AT qingyantian icgrafanimprovedclusteringwithgraphembeddingbasedfeaturesforsoftwaredefectprediction
AT haishanlin icgrafanimprovedclusteringwithgraphembeddingbasedfeaturesforsoftwaredefectprediction