I might be missing something here as a non-expert, but isn’t chain-of-thought essentially asking the model to narrate what it’s “thinking,” and then monitoring that narration?
That feels closer to injecting a self-report step than observing internal reasoning.
That feels closer to injecting a self-report step than observing internal reasoning.
As far as I understand it, it’s a generated narration conditioned by the prompt, not direct access to internal reasoning.
Source: all of mechinterp
Implement hooks in codex then.
Similar performance with 7% of tokens as chain of thought.
https://arxiv.org/abs/2502.18600