Evaluating chain-of-thought monitorability

(openai.com)

34 points | by mfiguiere 2 days ago

3 comments

  • ursAxZA 2 hours ago
    I might be missing something here as a non-expert, but isn’t chain-of-thought essentially asking the model to narrate what it’s “thinking,” and then monitoring that narration?

    That feels closer to injecting a self-report step than observing internal reasoning.

    • crthpl 1 hour ago
      the chain of thought is what it is thinking
      • ursAxZA 1 hour ago
        Chain-of-thought is a technical term in LLMs — not literally “what it’s thinking.”

        As far as I understand it, it’s a generated narration conditioned by the prompt, not direct access to internal reasoning.

      • arthurcolle 11 minutes ago
        Wrong to the point of being misleading. This is a goal, not an assumption

        Source: all of mechinterp

      • Bjartr 36 minutes ago
        It is text that describes a plausible/likely thought process that conditions future generation by it's presence in the context.
  • ramoz 3 hours ago
    > Our expectation is that combining multiple approaches—a defense-in-depth strategy—can help cover gaps that any single method leaves exposed.

    Implement hooks in codex then.

  • leetrout 2 hours ago
    Related check out chain of draft if you haven't.

    Similar performance with 7% of tokens as chain of thought.

    https://arxiv.org/abs/2502.18600