Enhancing the Sustainability of Machine Learning-Based Malware Detection Techniques for Android Applications

The rapid increase in smartphone usage has led to a corresponding rise in malicious Android applications, making it important to develop efficient and sustainable malware detection methods that maintain high accuracy. This paper presents a two-stage machine learning approach aimed at improving both...

Full description

Saved in:
Bibliographic Details
Main Authors: Seyeon Park, Hojun Lee, Daeun Kim, Hyeun Jun Moon, Seong-Je Cho, Youngsup Hwang, Hyoil Han, Kyoungwon Suh
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11023590/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850103113258631168
author Seyeon Park
Hojun Lee
Daeun Kim
Hyeun Jun Moon
Seong-Je Cho
Youngsup Hwang
Hyoil Han
Kyoungwon Suh
author_facet Seyeon Park
Hojun Lee
Daeun Kim
Hyeun Jun Moon
Seong-Je Cho
Youngsup Hwang
Hyoil Han
Kyoungwon Suh
author_sort Seyeon Park
collection DOAJ
description The rapid increase in smartphone usage has led to a corresponding rise in malicious Android applications, making it important to develop efficient and sustainable malware detection methods that maintain high accuracy. This paper presents a two-stage machine learning approach aimed at improving both detection accuracy and sustainability in Android malware classification. The first stage estimates the release year of an app using its SDK version information, while the second stage classifies apps as benign or malicious through a weighted voting mechanism applied to year-specific malware detection models. This method balances the high accuracy of retraining with reduced computational overhead, delivering robust and scalable malware detection. Using a dataset spanning 2014 to 2023, we evaluate the performance of the proposed method in comparison to retraining-based and incremental learning-based approaches. Experimental results demonstrate that while the retraining-based method achieves the highest accuracy and F1 score, it incurs a significant increase in training time. Conversely, the incremental learning-based method offers lower accuracy but reduced training time. Our two-stage model-based classification method effectively balances these trade-offs, providing accuracy comparable to the retraining-based approach while maintaining stable training times and moderate model sizes, making it a viable option for sustainable malware detection in real-world environments. Future research will explore non-machine-learning-based release year prediction methods to further optimize training efficiency and improve adaptability to the rapidly evolving malware detection landscape.
format Article
id doaj-art-c0dcd2c0c10b48b681f3ca64de4e4e26
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-c0dcd2c0c10b48b681f3ca64de4e4e262025-08-20T02:39:37ZengIEEEIEEE Access2169-35362025-01-0113988769888710.1109/ACCESS.2025.357673311023590Enhancing the Sustainability of Machine Learning-Based Malware Detection Techniques for Android ApplicationsSeyeon Park0https://orcid.org/0009-0008-8784-8181Hojun Lee1Daeun Kim2Hyeun Jun Moon3Seong-Je Cho4https://orcid.org/0000-0001-9917-0429Youngsup Hwang5https://orcid.org/0000-0002-8713-9253Hyoil Han6Kyoungwon Suh7https://orcid.org/0009-0005-1558-7814Department of Software Science, Dankook University, Yongin-si, South KoreaDepartment of Software Science, Dankook University, Yongin-si, South KoreaDepartment of Software Science, Dankook University, Yongin-si, South KoreaDepartment of Architectural Engineering, Dankook University, Yongin-si, South KoreaDepartment of Software Science, Dankook University, Yongin-si, South KoreaDivision of Computer Science and Engineering, Sun Moon University, Asan-si, South KoreaSchool of Information Technology, Illinois State University, Normal, IL, USASchool of Information Technology, Illinois State University, Normal, IL, USAThe rapid increase in smartphone usage has led to a corresponding rise in malicious Android applications, making it important to develop efficient and sustainable malware detection methods that maintain high accuracy. This paper presents a two-stage machine learning approach aimed at improving both detection accuracy and sustainability in Android malware classification. The first stage estimates the release year of an app using its SDK version information, while the second stage classifies apps as benign or malicious through a weighted voting mechanism applied to year-specific malware detection models. This method balances the high accuracy of retraining with reduced computational overhead, delivering robust and scalable malware detection. Using a dataset spanning 2014 to 2023, we evaluate the performance of the proposed method in comparison to retraining-based and incremental learning-based approaches. Experimental results demonstrate that while the retraining-based method achieves the highest accuracy and F1 score, it incurs a significant increase in training time. Conversely, the incremental learning-based method offers lower accuracy but reduced training time. Our two-stage model-based classification method effectively balances these trade-offs, providing accuracy comparable to the retraining-based approach while maintaining stable training times and moderate model sizes, making it a viable option for sustainable malware detection in real-world environments. Future research will explore non-machine-learning-based release year prediction methods to further optimize training efficiency and improve adaptability to the rapidly evolving malware detection landscape.https://ieeexplore.ieee.org/document/11023590/Android malwaremalware detectionRandom Forest model
spellingShingle Seyeon Park
Hojun Lee
Daeun Kim
Hyeun Jun Moon
Seong-Je Cho
Youngsup Hwang
Hyoil Han
Kyoungwon Suh
Enhancing the Sustainability of Machine Learning-Based Malware Detection Techniques for Android Applications
IEEE Access
Android malware
malware detection
Random Forest model
title Enhancing the Sustainability of Machine Learning-Based Malware Detection Techniques for Android Applications
title_full Enhancing the Sustainability of Machine Learning-Based Malware Detection Techniques for Android Applications
title_fullStr Enhancing the Sustainability of Machine Learning-Based Malware Detection Techniques for Android Applications
title_full_unstemmed Enhancing the Sustainability of Machine Learning-Based Malware Detection Techniques for Android Applications
title_short Enhancing the Sustainability of Machine Learning-Based Malware Detection Techniques for Android Applications
title_sort enhancing the sustainability of machine learning based malware detection techniques for android applications
topic Android malware
malware detection
Random Forest model
url https://ieeexplore.ieee.org/document/11023590/
work_keys_str_mv AT seyeonpark enhancingthesustainabilityofmachinelearningbasedmalwaredetectiontechniquesforandroidapplications
AT hojunlee enhancingthesustainabilityofmachinelearningbasedmalwaredetectiontechniquesforandroidapplications
AT daeunkim enhancingthesustainabilityofmachinelearningbasedmalwaredetectiontechniquesforandroidapplications
AT hyeunjunmoon enhancingthesustainabilityofmachinelearningbasedmalwaredetectiontechniquesforandroidapplications
AT seongjecho enhancingthesustainabilityofmachinelearningbasedmalwaredetectiontechniquesforandroidapplications
AT youngsuphwang enhancingthesustainabilityofmachinelearningbasedmalwaredetectiontechniquesforandroidapplications
AT hyoilhan enhancingthesustainabilityofmachinelearningbasedmalwaredetectiontechniquesforandroidapplications
AT kyoungwonsuh enhancingthesustainabilityofmachinelearningbasedmalwaredetectiontechniquesforandroidapplications