When LLMs Know They Don't: Probing Latent
Representations for Logical Insufficiency

Matt Wang, Prabhav Singh, Tom Wang
*Authors contributed equally to this work
Center for Language and Speech Processing
Johns Hopkins University, Baltimore, MD
📄 Paper (Coming Soon) 💻 Code

Overview

Large language models confidently hallucinate on logically insufficient questions—problems lacking necessary information for deterministic solutions. We investigate whether this failure reflects the LLMs' inability to recognize insufficiency or merely their inability to verbalize this internal knowledge.

Through systematic probing experiments across three mathematical reasoning benchmarks and four LLMs, we demonstrate that logical insufficiency is robustly encoded as a linearly separable property in frozen LLM representations. Simple linear probes achieve 80–91% F1 on binary insufficiency detection, despite models' verbal self-assessment accuracy remaining near chance.

This representation-language gap averaging +25.6 percentage points reveals that models already "know" when they cannot know, but this knowledge remains latent. We exploit these internal signals to implement lightweight probe-guided intervention systems that detect insufficient queries before generation.

Probe-Guided Clarification Pipeline

Research Poster

Research Poster