Manuscript Title:

HYPER PARAMETERS TUNING USING PARTIAL SWARM OPTIMIZATION ALGORITHM BASED ON RANDOM FOREST FOR URLsBASED PHISHING DETECTION

Author:

DUMUA AL AHMARE, SAIMA ANWAR LASHARI, ABDULLAH KHAN, SANA SALAH-UDDIN

DOI Number:

DOI:10.5281/zenodo.11273166

Published : 2024-05-23

About the author(s)

1. DUMUA AL AHMARE - College of Computing and Informatics, Saudi Electronic University, Riyadh, KSA.
2. SAIMA ANWAR LASHARI - College of Computing and Informatics, Saudi Electronic University, Riyadh, KSA.
3. ABDULLAH KHAN - Institute of Computer Sciences and Information Technology, the University of Agriculture Peshawar, Pakistan.
4. SANA SALAH-UDDIN - Institute of Computer Sciences and Information Technology, the University of Agriculture Peshawar, Pakistan.

Full Text : PDF

Abstract

Creating phishing URLs is a common deception technique in phishing attack as appear to be legitimate website. Phishing URLs can cause serious dangers once loaded by the web browser such as drive-by download and crypto jacking attacks, therefore it is highly important to focus on identifying and preventing phishing URLs in early stages. The detection of phishing attack is a supervised classification process that make use of a labeled dataset to fit Machine Learning (ML) models and classify the data. Several security researchers came up with various ML techniques that able to detect and classify the website phishing However, phishing attack detection with high accuracy is still challenging. In this study, Sequential Forward Feature Selection (SFFS) technique is implemented to find the optimal set of features and Practical Swarm Optimization (PSO) hyper parameter tuning technique is developed to tune the hyper parameters of the Random Forest classifier to identify and detect phishing website by utilizing URLs -based features in tow phishing datasets. The result of the proposed technique showed the best and outperformed other ML techniques such as:(RF, LR, KNN, SVM and NBC) in terms of accuracy score as well as other classification performance measures.


Keywords

Machine Learning, RF, LR, KNN, SVM, Sequential Forward Feature and NBC.