Treffer: Quantifying Logical Consistency in Transformers via Query-Key Alignment

Title:
Quantifying Logical Consistency in Transformers via Query-Key Alignment
Contributors:
Skolkovo Institute of Science and Technology [Moscow] (Skoltech), Huawei Noah’s Ark lab, Huawei Technologies Co., Ltd [Moscow], Moscow Institute of Physics and Technology [Moscow] (MIPT), Artificial Intelligence Research Institute (AIRI), Université Paris Cité (UPCité), Institut de Mathématiques de Jussieu - Paris Rive Gauche (IMJ-PRG (UMR_7586)), Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Université Paris Cité (UPCité), Centre National de la Recherche Scientifique (CNRS)
Source:
ACL Anthology. :35184-35199
Publisher Information:
CCSD, 2025.
Publication Year:
2025
Collection:
collection:CNRS
collection:INSMI
collection:IMJ
collection:SORBONNE-UNIVERSITE
collection:SORBONNE-UNIV
collection:SU-SCIENCES
collection:UNIV-PARIS
collection:UNIVERSITE-PARIS
collection:UP-SCIENCES
collection:SU-TI
collection:ALLIANCE-SU
collection:SUPRA_MATHS_INFO
Original Identifier:
ARXIV: 2502.17017
HAL: hal-05401210
Document Type:
Zeitschrift article<br />Journal articles
Language:
English
Relation:
info:eu-repo/semantics/altIdentifier/arxiv/2502.17017; info:eu-repo/semantics/altIdentifier/doi/10.18653/v1/2025.emnlp-main.1785
DOI:
10.18653/v1/2025.emnlp-main.1785
Rights:
info:eu-repo/semantics/OpenAccess
URL: http://creativecommons.org/licenses/by/
Accession Number:
edshal.hal.05401210v1
Database:
HAL

Weitere Informationen

Large language models (LLMs) excel at many NLP tasks, yet their multi-step logical reasoning remains unreliable. Existing solutions such as Chain-of-Thought prompting generate intermediate steps but provide no internal check of their logical coherence. In this paper, we use the "QK-score", a lightweight metric based on query-key alignments within transformer attention heads, to evaluate the logical reasoning capabilities of LLMs. Our method automatically identifies attention heads that play a key role in distinguishing valid from invalid logical inferences, enabling efficient inferencetime evaluation via a single forward pass. It reveals latent reasoning structure in LLMs and provides a scalable mechanistic alternative to ablation-based analysis. Across three benchmarks: ProntoQA-OOD, PARARULE-Plus, and MultiLogicEval, with models ranging from 1.5B to 70B parameters, the selected heads predict logical validity up to 14% better than the model probabilities, and remain robust under distractors and increasing reasoning depth of d ≤ 6.