Result: Beyond Feature Attribution: Quantifying Neural Unit Contributions using Multidimensional Shapley Analysis
Fachbereich Informatik. Fachbereich Informatik
Further Information
Artificial Intelligence models, such as ChatGPT, have gained immense popularity and are extensively utilized across various domains. Despite their widespread use, these models largely remain black boxes, with their internal workings obscure to users and developers alike. Understanding these models is crucial not only for improving their performance and reliability but also for ensuring they operate within ethical boundaries. Therefore, there is a pressing need for a unified, modelagnostic approach to explainable AI (XAI) that is effective across all data types. To address this need, we introduce a novel framework for Multi-dimensional Shapley Value Analysis, encapsulated in an open-source Python package. This framework advances beyond traditional feature attribution methods like SHAP, enabling the calculation of unit contributions towards multidimensional outputs. We demonstrate this framework through applications on three distinct types of neural networks: Multi-layer Perceptrons (MLP), Large Language Models (LLM), and Deep Convolutional Generative Adversarial Networks (DCGAN). Our investigation begins with the most fundamental neural unit, the neuron in an MLP. We explore the impact of different regularization techniques on neuron functionality and computation distribution. Contrary to popular belief, we find that in networks without regularization, the importance of a neuron shows no correlation with its weights. To demonstrate the scalability of the framework, we then apply it to a highly complex LLM with 56 billion parameters, the Mixtral-8x7B, demonstrating the scalability of our approach to state-of-the-art models. This analysis uncovers taskspecific neural units, revealing that removing certain units can hinder the LLM’s ability to produce a specific language without affecting its comprehension of that language. Finally, we apply our framework to analyze the contributions of neural units in a DCGAN. Our findings suggest that unlike in traditional classification networks, GANs process features in reverse ...