When LLMs Know They Don't: Probing Latent Representations for Logical Insufficiency

Overview

Large language models confidently hallucinate on logically insufficient questions—problems lacking necessary information for deterministic solutions. We investigate whether this failure reflects the LLMs' inability to recognize insufficiency or merely their inability to verbalize this internal knowledge.

Through systematic probing experiments across three mathematical reasoning benchmarks and four LLMs, we demonstrate that logical insufficiency is robustly encoded as a linearly separable property in frozen LLM representations. Simple linear probes achieve 80–91% F1 on binary insufficiency detection, despite models' verbal self-assessment accuracy remaining near chance.

This representation-language gap averaging +25.6 percentage points reveals that models already "know" when they cannot know, but this knowledge remains latent. We exploit these internal signals to implement lightweight probe-guided intervention systems that detect insufficient queries before generation.

Overview

Research Poster