March 27, 2023

Dedicated Forum to help removing adware, malware, spyware, ransomware, trojans, viruses and more!

Interpretations Cannot Be Trusted: Stealthy and Effective Adversarial Perturbations against Interpretable Deep Learning. (arXiv:2211.15926v1 [cs.CR])

Deep learning methods have gained increased attention in various applications
due to their outstanding performance. For exploring how this high performance
relates to the proper use of data artifacts and the accurate problem
formulation of a given task, interpretation models have become a crucial
component in developing deep learning-based systems. Interpretation models
enable the understanding of the inner workings of deep learning models and
offer a sense of security in detecting the misuse of artifacts in the input
data. Similar to prediction models, interpretation models are also susceptible
to adversarial inputs. This work introduces two attacks, AdvEdge and
AdvEdge$^{+}$, that deceive both the target deep learning model and the coupled
interpretation model. We assess the effectiveness of proposed attacks against
two deep learning model architectures coupled with four interpretation models
that represent different categories of interpretation models. Our experiments
include the attack implementation using various attack frameworks. We also
explore the potential countermeasures against such attacks. Our analysis shows
the effectiveness of our attacks in terms of deceiving the deep learning models
and their interpreters, and highlights insights to improve and circumvent the