Integrating Information Gain and Chi-Square for Enhanced Malware Detection Performance

Malware represents a serious and continuously evolving threat in the modern digital environment. Detecting malware is essential to safeguard devices and systems from risks such as data corruption, data theft, account compromises, and unauthorized access that could result in total system takeover. A...

Full description

Saved in:
Bibliographic Details
Main Authors: Fauzi Adi Rafrastara, Wildanil Ghozi, Ramadhan Rakhmat Sani, Lekso Budi Handoko, Abdussalam Abdussalam, Elkaf Rahmawan Pramudya, Faizal M. Abdollah
Format: Article
Language:English
Published: UUM Press 2025-01-01
Series:Journal of ICT
Subjects:
Online Access:https://e-journal.uum.edu.my/index.php/jict/article/view/25567
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832583192536678400
author Fauzi Adi Rafrastara
Wildanil Ghozi
Ramadhan Rakhmat Sani
Lekso Budi Handoko
Abdussalam Abdussalam
Elkaf Rahmawan Pramudya
Faizal M. Abdollah
author_facet Fauzi Adi Rafrastara
Wildanil Ghozi
Ramadhan Rakhmat Sani
Lekso Budi Handoko
Abdussalam Abdussalam
Elkaf Rahmawan Pramudya
Faizal M. Abdollah
author_sort Fauzi Adi Rafrastara
collection DOAJ
description Malware represents a serious and continuously evolving threat in the modern digital environment. Detecting malware is essential to safeguard devices and systems from risks such as data corruption, data theft, account compromises, and unauthorized access that could result in total system takeover. As malware has progressed from its simpler, monomorphic variants to more sophisticated forms like oligomorphic, polymorphic, and metamorphic, a machine learning-based detection system is now required, surpassing the limitations of traditional signature-based methods. Recent studies have shown that this challenge can be addressed by employing machine learning algorithms for detection. Some studies have also implemented various feature selection methods to optimize detection efficiency. However, they continue to struggle with false positives and false negatives, striving to reach zero tolerance in malware detection. This study introduces the IGCS method, a combined feature selection approach that integrates Information Gain with Chi-Square (X²) to enhance both the effectiveness and efficiency of machine learning classifiers. Using IGCS, six classifiers—Random Forest, XGBoost, kNN, Decision Tree, Logistic Regression, and Naïve Bayes—achieved higher performance scores compared to other scenarios, such as when classifiers were combined with Information Gain, Chi-Square, PCA, or even without any feature selection. As a result, Random Forest with 30 features selected by IGCS proved superior to any combination of classifiers and feature selection methods in malware detection, achieving 99.0% accuracy, recall, precision, and F1-Score. This combination also demonstrated efficiency with a 52.5% decrease in training time and a 56.9% decrease in testing time.
format Article
id doaj-art-cddc479de80c420b855d48a029fe22ee
institution Kabale University
issn 1675-414X
2180-3862
language English
publishDate 2025-01-01
publisher UUM Press
record_format Article
series Journal of ICT
spelling doaj-art-cddc479de80c420b855d48a029fe22ee2025-01-29T01:41:25ZengUUM PressJournal of ICT1675-414X2180-38622025-01-0124110.32890/jict2025.24.1.4Integrating Information Gain and Chi-Square for Enhanced Malware Detection PerformanceFauzi Adi Rafrastara0Wildanil Ghozi1Ramadhan Rakhmat Sani2Lekso Budi Handoko3Abdussalam Abdussalam4Elkaf Rahmawan Pramudya5Faizal M. Abdollah6Faculty of Computer Science, Universitas Dian Nuswantoro, IndonesiaFaculty of Computer Science, Universitas Dian Nuswantoro, IndonesiaFaculty of Computer Science, Universitas Dian Nuswantoro, IndonesiaFaculty of Computer Science, Universitas Dian Nuswantoro, IndonesiaFaculty of Computer Science, Universitas Dian Nuswantoro, IndonesiaFaculty of Computer Science, Universitas Dian Nuswantoro, IndonesiaFakulti Teknologi Maklumat dan Komunikasi, Universiti Teknikal Malaysia Melaka, Malaysia Malware represents a serious and continuously evolving threat in the modern digital environment. Detecting malware is essential to safeguard devices and systems from risks such as data corruption, data theft, account compromises, and unauthorized access that could result in total system takeover. As malware has progressed from its simpler, monomorphic variants to more sophisticated forms like oligomorphic, polymorphic, and metamorphic, a machine learning-based detection system is now required, surpassing the limitations of traditional signature-based methods. Recent studies have shown that this challenge can be addressed by employing machine learning algorithms for detection. Some studies have also implemented various feature selection methods to optimize detection efficiency. However, they continue to struggle with false positives and false negatives, striving to reach zero tolerance in malware detection. This study introduces the IGCS method, a combined feature selection approach that integrates Information Gain with Chi-Square (X²) to enhance both the effectiveness and efficiency of machine learning classifiers. Using IGCS, six classifiers—Random Forest, XGBoost, kNN, Decision Tree, Logistic Regression, and Naïve Bayes—achieved higher performance scores compared to other scenarios, such as when classifiers were combined with Information Gain, Chi-Square, PCA, or even without any feature selection. As a result, Random Forest with 30 features selected by IGCS proved superior to any combination of classifiers and feature selection methods in malware detection, achieving 99.0% accuracy, recall, precision, and F1-Score. This combination also demonstrated efficiency with a 52.5% decrease in training time and a 56.9% decrease in testing time. https://e-journal.uum.edu.my/index.php/jict/article/view/25567Malware detectionIGCSfeature selectionInformation GainChi-Square
spellingShingle Fauzi Adi Rafrastara
Wildanil Ghozi
Ramadhan Rakhmat Sani
Lekso Budi Handoko
Abdussalam Abdussalam
Elkaf Rahmawan Pramudya
Faizal M. Abdollah
Integrating Information Gain and Chi-Square for Enhanced Malware Detection Performance
Journal of ICT
Malware detection
IGCS
feature selection
Information Gain
Chi-Square
title Integrating Information Gain and Chi-Square for Enhanced Malware Detection Performance
title_full Integrating Information Gain and Chi-Square for Enhanced Malware Detection Performance
title_fullStr Integrating Information Gain and Chi-Square for Enhanced Malware Detection Performance
title_full_unstemmed Integrating Information Gain and Chi-Square for Enhanced Malware Detection Performance
title_short Integrating Information Gain and Chi-Square for Enhanced Malware Detection Performance
title_sort integrating information gain and chi square for enhanced malware detection performance
topic Malware detection
IGCS
feature selection
Information Gain
Chi-Square
url https://e-journal.uum.edu.my/index.php/jict/article/view/25567
work_keys_str_mv AT fauziadirafrastara integratinginformationgainandchisquareforenhancedmalwaredetectionperformance
AT wildanilghozi integratinginformationgainandchisquareforenhancedmalwaredetectionperformance
AT ramadhanrakhmatsani integratinginformationgainandchisquareforenhancedmalwaredetectionperformance
AT leksobudihandoko integratinginformationgainandchisquareforenhancedmalwaredetectionperformance
AT abdussalamabdussalam integratinginformationgainandchisquareforenhancedmalwaredetectionperformance
AT elkafrahmawanpramudya integratinginformationgainandchisquareforenhancedmalwaredetectionperformance
AT faizalmabdollah integratinginformationgainandchisquareforenhancedmalwaredetectionperformance