New Heuristics Method for Malicious URLs Detection Using Machine Learning

Malicious URLs are a very prominent, dangerous form of cyber threats in view of the fact that they can enable many evils like phishing attacks, malware distribution, and several other kinds of cyber fraud. The techniques of detection conventionally applied are based on blacklisting and heuristic an...

Full description

Saved in:

Bibliographic Details
Main Author:	Maher Kassem Hasan
Format:	Article
Language:	English
Published:	College of Computer and Information Technology – University of Wasit, Iraq 2024-09-01
Series:	Wasit Journal of Computer and Mathematics Science
Subjects:	Malicious URLs, Machine Learning, Cybersecurity, Logistic Regression, Random Forest, Support Vector Machines
Online Access:	http://wjcm.uowasit.edu.iq/index.php/wjcm/article/view/267
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832582094671314944
author	Maher Kassem Hasan
author_facet	Maher Kassem Hasan
author_sort	Maher Kassem Hasan
collection	DOAJ
description	Malicious URLs are a very prominent, dangerous form of cyber threats in view of the fact that they can enable many evils like phishing attacks, malware distribution, and several other kinds of cyber fraud. The techniques of detection conventionally applied are based on blacklisting and heuristic analyses, which are gradually becoming inefficient against sophisticated, rapidly evolving threats. In this paper, the authors present various machine learning techniques applied in malicious URL detection. In the present paper, we will look at three machine learning models: Logistic Regression, Random Forest, and Support Vector Machines. We used a methodology that involved collecting data and feature extraction, training a model, then evaluating its performance with different metrics such as accuracy, precision, recall, and F1-score. We implemented and optimized three models—Logistic Regression, Random Forest, and Support Vector Machines (SVM)—based on the literature available that indicates the effectiveness of these models. Logistic Regression shows promising results to detect the malicious URLs, according to Vanitha and Vinodhini. Random Forest models are found to be very robust and accurate according to Cui et al. and Vanhoenshoven et al., SVM models are evidenced to have very high accuracy according to Manjeri et al., Further works on deep learning models emphasized their potentials. In our study, the optimized Random Forest model in our case showed the best performance, and its training accuracy was 99%, while validation accuracy was 90.5%, also logistic Regression and SVM achieved training accuracy was 89.31%, while validation accuracy was 90.5%. All the optimization processes, model performances, and integration into the real-time cybersecurity infrastructures, along with the strengths and limitations, are discussed in this paper. The paper will, therefore, discuss the benefits and challenges for each model in this aspect—emphasizing continuous updating of the models and integrating them into real-time cybersecurity infrastructures.
format	Article
id	doaj-art-60f9514ed30c48e98610a32986f39875
institution	Kabale University
issn	2788-5879 2788-5887
language	English
publishDate	2024-09-01
publisher	College of Computer and Information Technology – University of Wasit, Iraq
record_format	Article
series	Wasit Journal of Computer and Mathematics Science
spelling	doaj-art-60f9514ed30c48e98610a32986f398752025-01-30T05:23:46ZengCollege of Computer and Information Technology – University of Wasit, IraqWasit Journal of Computer and Mathematics Science2788-58792788-58872024-09-013310.31185/wjcms.267New Heuristics Method for Malicious URLs Detection Using Machine LearningMaher Kassem Hasan0Computer Science Department, College of Computer Science and Information Technology, University of Kerbala, Kerbala, Iraq Malicious URLs are a very prominent, dangerous form of cyber threats in view of the fact that they can enable many evils like phishing attacks, malware distribution, and several other kinds of cyber fraud. The techniques of detection conventionally applied are based on blacklisting and heuristic analyses, which are gradually becoming inefficient against sophisticated, rapidly evolving threats. In this paper, the authors present various machine learning techniques applied in malicious URL detection. In the present paper, we will look at three machine learning models: Logistic Regression, Random Forest, and Support Vector Machines. We used a methodology that involved collecting data and feature extraction, training a model, then evaluating its performance with different metrics such as accuracy, precision, recall, and F1-score. We implemented and optimized three models—Logistic Regression, Random Forest, and Support Vector Machines (SVM)—based on the literature available that indicates the effectiveness of these models. Logistic Regression shows promising results to detect the malicious URLs, according to Vanitha and Vinodhini. Random Forest models are found to be very robust and accurate according to Cui et al. and Vanhoenshoven et al., SVM models are evidenced to have very high accuracy according to Manjeri et al., Further works on deep learning models emphasized their potentials. In our study, the optimized Random Forest model in our case showed the best performance, and its training accuracy was 99%, while validation accuracy was 90.5%, also logistic Regression and SVM achieved training accuracy was 89.31%, while validation accuracy was 90.5%. All the optimization processes, model performances, and integration into the real-time cybersecurity infrastructures, along with the strengths and limitations, are discussed in this paper. The paper will, therefore, discuss the benefits and challenges for each model in this aspect—emphasizing continuous updating of the models and integrating them into real-time cybersecurity infrastructures. http://wjcm.uowasit.edu.iq/index.php/wjcm/article/view/267Malicious URLs, Machine Learning, Cybersecurity, Logistic Regression, Random Forest, Support Vector Machines
spellingShingle	Maher Kassem Hasan New Heuristics Method for Malicious URLs Detection Using Machine Learning Wasit Journal of Computer and Mathematics Science Malicious URLs, Machine Learning, Cybersecurity, Logistic Regression, Random Forest, Support Vector Machines
title	New Heuristics Method for Malicious URLs Detection Using Machine Learning
title_full	New Heuristics Method for Malicious URLs Detection Using Machine Learning
title_fullStr	New Heuristics Method for Malicious URLs Detection Using Machine Learning
title_full_unstemmed	New Heuristics Method for Malicious URLs Detection Using Machine Learning
title_short	New Heuristics Method for Malicious URLs Detection Using Machine Learning
title_sort	new heuristics method for malicious urls detection using machine learning
topic	Malicious URLs, Machine Learning, Cybersecurity, Logistic Regression, Random Forest, Support Vector Machines
url	http://wjcm.uowasit.edu.iq/index.php/wjcm/article/view/267
work_keys_str_mv	AT maherkassemhasan newheuristicsmethodformaliciousurlsdetectionusingmachinelearning

New Heuristics Method for Malicious URLs Detection Using Machine Learning

Similar Items