Treffer: Quantifying Logical Consistency in Transformers via Query-Key Alignment
collection:INSMI
collection:IMJ
collection:SORBONNE-UNIVERSITE
collection:SORBONNE-UNIV
collection:SU-SCIENCES
collection:UNIV-PARIS
collection:UNIVERSITE-PARIS
collection:UP-SCIENCES
collection:SU-TI
collection:ALLIANCE-SU
collection:SUPRA_MATHS_INFO
HAL: hal-05401210
URL: http://creativecommons.org/licenses/by/
Weitere Informationen
Large language models (LLMs) excel at many NLP tasks, yet their multi-step logical reasoning remains unreliable. Existing solutions such as Chain-of-Thought prompting generate intermediate steps but provide no internal check of their logical coherence. In this paper, we use the "QK-score", a lightweight metric based on query-key alignments within transformer attention heads, to evaluate the logical reasoning capabilities of LLMs. Our method automatically identifies attention heads that play a key role in distinguishing valid from invalid logical inferences, enabling efficient inferencetime evaluation via a single forward pass. It reveals latent reasoning structure in LLMs and provides a scalable mechanistic alternative to ablation-based analysis. Across three benchmarks: ProntoQA-OOD, PARARULE-Plus, and MultiLogicEval, with models ranging from 1.5B to 70B parameters, the selected heads predict logical validity up to 14% better than the model probabilities, and remain robust under distractors and increasing reasoning depth of d ≤ 6.