1965 Downloads (Pure)


Neural networks are nowadays used for more and more applications, often without the end-user even being aware of it. However, because of the number of parameters and amount of training data, they effectively become a black box. We have no way of knowing for what reasons a decision was output. When we use neural networks to make high-impact decisions, this becomes dangerous. In this thesis, we address this need for explainability with a focus on application. In particular, we note the gap between the amount of research into interpretability and its application in practice and contribute to close this gap. To this end, we contribute a chapter with a historic perspective on prior approaches towards explainability. Furthermore, we introduce a novel approach to evaluate explanation methods for a given use case, enabling practitioners to identify the best method for their task. Using this approach, we confirm that there are inherent differences in the interpretability of several popular network architectures. This insight implies that it may be favorable to
use a less accurate architecture that is more interpretable for a task where the neural network classification is secondary. We show that an ensemble of explanation methods not only better explains a neural network but is also much more robust to adversarial manipulation. Especially when explanations are a factor in the final decision or used to determine whether a given decision was aligned with ethical requirements, making them robust to malicious
manipulation is vital. Lastly, we contribute a method to infuse prior knowledge into a neural network via explanations. The end-to-end nature of neural networks prevents the use of domain knowledge in the algorithm and requires unbiased, independent, and identically distributed data for training. Our method gives practitioners the ability to restrict learning of unintentionally informative features that ‘give away’ the label, enabling the use of neural networks in the more realistic scenario with qualitatively worse data.
Original languageEnglish
PublisherTechnical University of Denmark
Number of pages141
Publication statusPublished - 2020


Dive into the research topics of 'Explainability for neural networks'. Together they form a unique fingerprint.

Cite this