Machine Learning-Based Dangerous URL Identification

Abstract: Currently, the risk of network information is high. Insecurity is rising in both quantity and severity. Hackers' most popular tactics nowadays are to target end-to-end technology and exploit human weaknesses. Approaches include social engineering, phishing, and pharming. The use of malicious Uniform Resource Locators (URLs) to deceive users is one stage in carrying out these assaults. As a result, malicious URL detection is gaining popularity. Several scientific studies employing machine learning and deep learning approaches have been published, demonstrating a range of methods for identifying malicious URLs. In this paper, we present a machine learning-based malicious URL detection technique based on our hypothesized URL behaviors and attributes. In addition, Bigdata technology is being utilized to improve the identification of harmful URLs based on anomalous behavior. In summary, the proposed detection system consists of a novel set of URL attributes and behaviors, a machine learning algorithm, and bigdata technologies. The testing results indicate that the proposed URL characteristics and behavior can significantly improve the ability to detect malicious URLs. The proposed approach should be regarded as an optimized and user-friendly solution for dangerous URL detection.

Keywords: URL; malicious URL detection; feature extraction; feature selection; machine learning

| DOI: 10.17148/IARJSET.2023.107112