Result: Comparisons of machine learning techniques for detecting malicious webpages

Title:
Comparisons of machine learning techniques for detecting malicious webpages
Source:
Expert systems with applications. 42(3):1166-1177
Publisher Information:
Amsterdam: Elsevier, 2015.
Publication Year:
2015
Physical Description:
print, 3/4 p
Original Material:
INIST-CNRS
Subject Terms:
Computer science, Informatique, Sciences exactes et technologie, Exact sciences and technology, Sciences appliquees, Applied sciences, Informatique; automatique theorique; systemes, Computer science; control theory; systems, Logiciel, Software, Systèmes informatiques et systèmes répartis. Interface utilisateur, Computer systems and distributed systems. User interface, Organisation des mémoires. Traitement des données, Memory organisation. Data processing, Traitement des données. Listes et chaînes de caractères, Data processing. List processing. Character string processing, Gestion des mémoires et des fichiers (y compris la protection et la sécurité des fichiers), Memory and file management (including protection and security), Intelligence artificielle, Artificial intelligence, Apprentissage et systèmes adaptatifs, Learning and adaptive systems, Affinité, Affinity, Afinidad, Algorithme k moyenne, K means algorithm, Algoritmo k media, Alimentation machine, Machine feed, Alimentación máquina, Analyse amas, Cluster analysis, Analisis cluster, Analyse donnée, Data analysis, Análisis datos, Angle observation, Viewing angle, Angulo observación, Apprentissage probabilités, Probability learning, Aprendizaje probabilidades, Apprentissage supervisé, Supervised learning, Aprendizaje supervisado, Classification non supervisée, Unsupervised classification, Clasificación no supervisada, Classification à vaste marge, Vector support machine, Máquina ejemplo soporte, Estimation Bayes, Bayes estimation, Estimación Bayes, Extensibilité, Scalability, Estensibilidad, Fichier log, Log file, Fichero actividad, Intelligence artificielle, Artificial intelligence, Inteligencia artificial, Internet, Intranet, Modélisation, Modeling, Modelización, Navigation information, Information browsing, Navegacíon informacíon, Plus proche voisin, Nearest neighbour, Vecino más cercano, Réseau web, World wide web, Red WWW, Simulation ordinateur, Computer simulation, Simulación computadora, Site Web, Web site, Sitio Web, Sécurité informatique, Computer security, Seguridad informatica, Apprentissage non supervisé, Unsupervised learning, Aprendizaje no supervisado, Liste noire, Black list, Lista negra, Système détection intrusion, Intrusion detection systems, Sistema de detección de intrusiones, Affinity Propagation, K-Means, K-Nearest Neighbor, Naive Bayes, Supervised and unsupervised learning, Support Vector Machine
Document Type:
Academic journal Article
File Description:
text
Language:
English
Author Affiliations:
Intelligent Systems Research Centre, School of Computing, London Metropolitan University, United Kingdom
ISSN:
0957-4174
Rights:
Copyright 2015 INIST-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Notes:
Computer science; theoretical automation; systems
Accession Number:
edscal.28928445
Database:
PASCAL Archive

Further Information

This paper compares machine learning techniques for detecting malicious webpages. The conventional method of detecting malicious webpages is going through the black list and checking whether the webpages are listed. Black list is a list of webpages which are classified as malicious from a user's point of view. These black lists are created by trusted organizations and volunteers. They are then used by modern web browsers such as Chrome, Firefox, Internet Explorer, etc. However, black list is ineffective because of the frequent-changing nature of webpages, growing numbers of webpages that pose scalability issues and the crawlers' inability to visit intranet webpages that require computer operators to log in as authenticated users. In this paper therefore alternative and novel approaches are used by applying machine learning algorithms to detect malicious webpages. In this paper three supervised machine learning techniques such as K-Nearest Neighbor, Support Vector Machine and Naive Bayes Classifier, and two unsupervised machine learning techniques such as K-Means and Affinity Propagation are employed. Please note that K-Means and Affinity Propagation have not been applied to detection of malicious webpages by other researchers. All these machine learning techniques have been used to build predictive models to analyze large number of malicious and safe webpages. These webpages were downloaded by a concurrent crawler taking advantage of gevent. The webpages were parsed and various features such as content, URL and screenshot of webpages were extracted to feed into the machine learning models. Computer simulation results have produced an accuracy of up to 98% for the supervised techniques and silhouette coefficient of close to 0.96 for the unsupervised techniques. These predictive models have been applied in a practical context whereby Google Chrome can harness the predictive capabilities of the classifiers that have the advantages of both the lightweight and the heavyweight classifiers.