Detecting Phishing Uniform Resource Locator (URL) using Machine Learning
Abstract
Due to the ever-increasing threat from phishing, internets users are in dire need of a system that could help them verify if the websites that they are visiting are safe and legitimate in real-time. Therefore, we developed a Chrome browser extension powered by an ML model to perform the website's classification in this project. Three ML classifiers, Logistic Regression (LR), Naïve Bayes (NB) and Support Vector Machine (SVM), were selected and trained with a dataset from UCI. The dataset contained 30 features and 11055 samples. Five features dependent on third parties were dropped to improve the classification speed. They were the age of the domain, DNS record, page rank, web traffic and Google index. We performed hyperparameter tuning using Grid Search and then trained the ML classifiers using the optimal values. After that, we evaluated the ML models based on accuracy and SVM accuracy was highest at 95.80%, followed by LR (92.03%) and NB (88.91%). Thus, the SVM model was selected as the classification model for this research.
Keywords—Phishing; Detection; Support Vector Machine; Logistic Regression, Naïve Bayes, Chrome.
Published
Issue
Section
Submission of an original manuscript to the Journal of Computing Technologies and Creative Content (JTeC) will be taken to mean that it represents original work not previoussly published, that it is not being considered elsewhere for publication. All submitted articles that are published by JTeC cannot be published anywhere by the authors unless with the permission by JTeC Editors. JTeC reserves the right to the publications of the articles it published, and reserves the right to reuse the articles elsewhere for academic purposes, while still retaining the names of the original authors with the original articles.
JTeC takes the stance that the publication of scholarly research is meant to disseminate knowledge and in a not-for-profit regime, benefits neither publisher nor author financially. It sees itself as having obligation to its author and to society to make content available online now that the technology allows for such possibility.