New Heuristics Method for Malicious URLs Detection Using Machine Learning

Malicious URLs are a very prominent, dangerous form of cyber threats in view of the fact that they can enable many evils like phishing attacks, malware distribution, and several other kinds of cyber fraud. The techniques of detection conventionally applied are based on blacklisting and heuristic an...

Full description

Saved in:
Bibliographic Details
Main Author: Maher Kassem Hasan
Format: Article
Language:English
Published: College of Computer and Information Technology – University of Wasit, Iraq 2024-09-01
Series:Wasit Journal of Computer and Mathematics Science
Subjects:
Online Access:http://wjcm.uowasit.edu.iq/index.php/wjcm/article/view/267
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832582094671314944
author Maher Kassem Hasan
author_facet Maher Kassem Hasan
author_sort Maher Kassem Hasan
collection DOAJ
description Malicious URLs are a very prominent, dangerous form of cyber threats in view of the fact that they can enable many evils like phishing attacks, malware distribution, and several other kinds of cyber fraud. The techniques of detection conventionally applied are based on blacklisting and heuristic analyses, which are gradually becoming inefficient against sophisticated, rapidly evolving threats. In this paper, the authors present various machine learning techniques applied in malicious URL detection. In the present paper, we will look at three machine learning models: Logistic Regression, Random Forest, and Support Vector Machines. We used a methodology that involved collecting data and feature extraction, training a model, then evaluating its performance with different metrics such as accuracy, precision, recall, and F1-score. We implemented and optimized three models—Logistic Regression, Random Forest, and Support Vector Machines (SVM)—based on the literature available that indicates the effectiveness of these models. Logistic Regression shows promising results to detect the malicious URLs, according to Vanitha and Vinodhini. Random Forest models are found to be very robust and accurate according to Cui et al. and Vanhoenshoven et al., SVM models are evidenced to have very high accuracy according to Manjeri et al., Further works on deep learning models emphasized their potentials. In our study, the optimized Random Forest model in our case showed the best performance, and its training accuracy was 99%, while validation accuracy was 90.5%, also logistic Regression and SVM achieved training accuracy was 89.31%, while validation accuracy was 90.5%. All the optimization processes, model performances, and integration into the real-time cybersecurity infrastructures, along with the strengths and limitations, are discussed in this paper. The paper will, therefore, discuss the benefits and challenges for each model in this aspect—emphasizing continuous updating of the models and integrating them into real-time cybersecurity infrastructures.  
format Article
id doaj-art-60f9514ed30c48e98610a32986f39875
institution Kabale University
issn 2788-5879
2788-5887
language English
publishDate 2024-09-01
publisher College of Computer and Information Technology – University of Wasit, Iraq
record_format Article
series Wasit Journal of Computer and Mathematics Science
spelling doaj-art-60f9514ed30c48e98610a32986f398752025-01-30T05:23:46ZengCollege of Computer and Information Technology – University of Wasit, IraqWasit Journal of Computer and Mathematics Science2788-58792788-58872024-09-013310.31185/wjcms.267New Heuristics Method for Malicious URLs Detection Using Machine LearningMaher Kassem Hasan0Computer Science Department, College of Computer Science and Information Technology, University of Kerbala, Kerbala, Iraq Malicious URLs are a very prominent, dangerous form of cyber threats in view of the fact that they can enable many evils like phishing attacks, malware distribution, and several other kinds of cyber fraud. The techniques of detection conventionally applied are based on blacklisting and heuristic analyses, which are gradually becoming inefficient against sophisticated, rapidly evolving threats. In this paper, the authors present various machine learning techniques applied in malicious URL detection. In the present paper, we will look at three machine learning models: Logistic Regression, Random Forest, and Support Vector Machines. We used a methodology that involved collecting data and feature extraction, training a model, then evaluating its performance with different metrics such as accuracy, precision, recall, and F1-score. We implemented and optimized three models—Logistic Regression, Random Forest, and Support Vector Machines (SVM)—based on the literature available that indicates the effectiveness of these models. Logistic Regression shows promising results to detect the malicious URLs, according to Vanitha and Vinodhini. Random Forest models are found to be very robust and accurate according to Cui et al. and Vanhoenshoven et al., SVM models are evidenced to have very high accuracy according to Manjeri et al., Further works on deep learning models emphasized their potentials. In our study, the optimized Random Forest model in our case showed the best performance, and its training accuracy was 99%, while validation accuracy was 90.5%, also logistic Regression and SVM achieved training accuracy was 89.31%, while validation accuracy was 90.5%. All the optimization processes, model performances, and integration into the real-time cybersecurity infrastructures, along with the strengths and limitations, are discussed in this paper. The paper will, therefore, discuss the benefits and challenges for each model in this aspect—emphasizing continuous updating of the models and integrating them into real-time cybersecurity infrastructures.   http://wjcm.uowasit.edu.iq/index.php/wjcm/article/view/267Malicious URLs, Machine Learning, Cybersecurity, Logistic Regression, Random Forest, Support Vector Machines
spellingShingle Maher Kassem Hasan
New Heuristics Method for Malicious URLs Detection Using Machine Learning
Wasit Journal of Computer and Mathematics Science
Malicious URLs, Machine Learning, Cybersecurity, Logistic Regression, Random Forest, Support Vector Machines
title New Heuristics Method for Malicious URLs Detection Using Machine Learning
title_full New Heuristics Method for Malicious URLs Detection Using Machine Learning
title_fullStr New Heuristics Method for Malicious URLs Detection Using Machine Learning
title_full_unstemmed New Heuristics Method for Malicious URLs Detection Using Machine Learning
title_short New Heuristics Method for Malicious URLs Detection Using Machine Learning
title_sort new heuristics method for malicious urls detection using machine learning
topic Malicious URLs, Machine Learning, Cybersecurity, Logistic Regression, Random Forest, Support Vector Machines
url http://wjcm.uowasit.edu.iq/index.php/wjcm/article/view/267
work_keys_str_mv AT maherkassemhasan newheuristicsmethodformaliciousurlsdetectionusingmachinelearning