Defending Corporate Cybersecurity NLP-Based Phishing Attack Classification in  Email Communication

Allen Isaac J; Dr. H R Divakar

doi:DOI 10.17148/IARJSET.2024.11811

Defending Corporate Cybersecurity NLP-Based Phishing Attack Classification in Email Communication

Allen Isaac J, Dr. H R Divakar

Abstract: This project presents a comprehensive approach to phishing detection by utilizing email scraping, feature extraction, and machine learning, alongside integrating external services such as Seahound and Netcraft. By analyzing mailbox files with mixed HTML and mail data, it addresses the challenge of identifying malicious content within emails. The pipeline includes data extraction and cleansing, followed by Natural Language Processing (NLP) to transform textual content into meaningful features. Seahound and Netcraft add an innovative layer: Seahound analyzes URL legitimacy and reputation, while Netcraft offers historical insights into domain trustworthiness, enriching the feature set for the machine learning model. The meticulously labeled dataset distinguishes legitimate emails from phishing attempts, enabling rigorous training and evaluation of machine learning models, notably the Random Forest classifier and Support Vector Machine (SVM). The SVM model demonstrates high precision, recall, and F1-score metrics. This project underscores the synergy of email scraping, NLP, feature extraction, and machine learning, highlighting the crucial role of external services in enhancing phishing detection accuracy, thus advancing online security and protecting users from email-based cyberattacks.

Keywords: Seahound, Netcraft, Natural Language Processing (NLP), Phishing Detection.

Downloads: | DOI: 10.17148/IARJSET.2024.11811

How to Cite:

[1] Allen Isaac J, Dr. H R Divakar, "Defending Corporate Cybersecurity NLP-Based Phishing Attack Classification in Email Communication," International Advanced Research Journal in Science, Engineering and Technology (IARJSET), DOI: 10.17148/IARJSET.2024.11811

International Advanced Research Journal in
Science, Engineering and Technology

Defending Corporate Cybersecurity NLP-Based Phishing Attack Classification in Email Communication

Call for Papers

Author Center

IARJSET Management

Archives