Treffer: ESD: E-mail Spam Detection using Cybersecurity-Driven Header Analysis and Machine Learning based Content Analysis.

Title:
ESD: E-mail Spam Detection using Cybersecurity-Driven Header Analysis and Machine Learning based Content Analysis.
Authors:
Batra, Harshita1, Nelson, Leema1 leema.nelson@gmail.com
Source:
International Journal of Performability Engineering. Apr2024, Vol. 20 Issue 4, p205-213. 9p.
Database:
Supplemental Index

Weitere Informationen

Background: Spams are commonly known as unwanted commercial or deceptive emails, which strategically target specific individuals or businesses to promote products or mislead recipients. However, with the implementation of advanced technologies such as machine learning and natural language processing, computers can be trained to discern and categorize these emails as spam or legitimate (ham) messages. Despite considerable efforts in spam filtering, the effective identification and mitigation of spam emails remain an ongoing challenge. Methods: This research places particular emphasis on scrutinizing email headers and extracting crucial data, such as HOP count and IP address, using a Python script that serves as a forensic or investigative tool for analyzing and extracting information from email files. Additionally, it assesses various vectorization techniques to gauge the efficacy of machine-learning approaches for spam classification. The work encompasses a range of supervised learning algorithms, including Logistic Regression, Decision Trees, Naive Bayes, and Natural Language Processing (NLP) methods, such as Bidirectional Encoder representation of transformers (BERT). Two vectorization methods, count vectorization and tf-idf vectorization, are compared. The evaluation metrics employed included accuracy, training time, CPU and wall times, precision, recall, f1 score, and support. Conclusion: The performance of the Decision Trees is particularly noteworthy, achieving a flawless 100% accuracy rate. The trained model is seamlessly integrated into both an Android application and a website, enabling realtime spam detection and classification. [ABSTRACT FROM AUTHOR]