Treffer: Sparse Backdoor Attack Against Neural Networks.
Weitere Informationen
Recent studies show that neural networks are vulnerable to backdoor attacks, in which compromised networks behave normally for clean inputs but make mistakes when a pre-defined trigger appears. Although prior studies have designed various invisible triggers to avoid causing visual anomalies, they cannot evade some trigger detectors. In this paper, we consider the stealthiness of backdoor attacks from input space and feature representation space. We propose a novel backdoor attack named sparse backdoor attack, and investigate the minimum required trigger to induce the well-trained networks to make incorrect results. A U-net-based generator is employed to create triggers for each clean image. Considering the stealthiness of the trigger, we restrict the elements of the trigger between −1 and 1. In the aspect of the feature representation domain, we adopt an entanglement cost function to minimize the gap between feature representations of benign and malicious inputs. The inseparability of benign and malicious feature representations contributes to the stealthiness of our attack against various model diagnosis-based defences. We validate the effectiveness and generalization of our method by conducting extensive experiments on multiple datasets and networks. [ABSTRACT FROM AUTHOR]
Copyright of Computer Journal is the property of Oxford University Press / USA and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)