Treffer: Cost based Random Forest Classifier for Intrusion Detection System in Internet of Things.
Weitere Informationen
Internet of Things (IoT) is the collection of physical and digital devices that are interconnected using Internet for exchange of information and delivery of services. The Internet of Things (IoT) is an extended application of Internet that is used to offer various services for users in the fields of agriculture, healthcare, education, smart homes and so on in the modern world. The significant issues of the intrusion present in IoT are network disconnection, network hacking and data theft from the source. So the challenging task for worldwide utilization of IoT is to address their security issues, because of the feature imbalance in the different types of attacks. The most essential task for addressing security issues is to predict and classify the intrusion in the network. In this paper, the Cost based Random Forest Classifier (CRFC) is proposed for developing an effective Intrusion Detection System (IDS). The CRFC based classification is improvised by incorporating the cost matrix calculated based on feature importance that helps to improve the process of splitting the features even if there is a feature imbalance. Further, three important libraries of Python namely, Spark, Kafka, and Scikit-learn are used in this IDS to improve the classification performances. In that, Spark is used to implement the distributed environment, Kafka is used for streaming the data and Scikit is used to implement CRFC. There are two datasets known as NSL-KDD and UNSW-NB15 that are used to evaluate the performance of the proposed CRFC-IDS method. The CRFC-IDS method is analyzed on the basis of accuracy, precision, recall, F1-Measure, Area Under the Curve (AUC), False Acceptance Rate (FAR) and Matthews Correlation Coefficient (MCC). The existing approaches OCSVM and DBF are used for comparison with the CRFC-IDS method. The accuracy of CRFC-IDS for NSL-KDD dataset is found to be 99.957%, which is highest when compared to OCSVM and DBF. • Effective IDS is developed by proposing an Improved Random Forest Classifier. • Feature importance based cost matrix is derived for improving the classification. • The Spark, Kafka and Scikit-learn libraries of Python are used to improve the IDS. • Maximum variance margin is used to remove the noise values from the input. [ABSTRACT FROM AUTHOR]