Detecting Phishing Uniform Resource Locator (URL) using Machine Learning

Authors

  • Delina Mei Yin Beh Universiti Kuala Lumpur
  • Roth Bahuang Universiti Kuala Lumpur

Abstract

Due to the ever-increasing threat from phishing, internets users are in dire need of a system that could help them verify if the websites that they are visiting are safe and legitimate in real-time. Therefore, we developed a Chrome browser extension powered by an ML model to perform the website's classification in this project. Three ML classifiers, Logistic Regression (LR), Naïve Bayes (NB) and Support Vector Machine (SVM), were selected and trained with a dataset from UCI. The dataset contained 30 features and 11055 samples. Five features dependent on third parties were dropped to improve the classification speed. They were the age of the domain, DNS record, page rank, web traffic and Google index. We performed hyperparameter tuning using Grid Search and then trained the ML classifiers using the optimal values. After that, we evaluated the ML models based on accuracy and SVM accuracy was highest at 95.80%, followed by LR (92.03%) and NB (88.91%).  Thus, the SVM model was selected as the classification model for this research.

Keywords—Phishing; Detection; Support Vector Machine; Logistic Regression, Naïve Bayes, Chrome.

Published

2022-12-30