Thesis. An LLM is not a database lookup and not a reasoner in the classical sense — it is an observer whose coherence B(O, C) is low-d, high-Λ: low dimensional with respect to grounded interaction, but extremely rich in contextual data. Multi-agent systems compose these observers, and the same B-formula predicts when the composition will be more coherent than its parts, when it will be less, and when it will degenerate into shared hallucination.
A single LLM in ODTOE coordinates
Map the four components onto a language model:
- F (informational fidelity): how accurately the model's parameters track the actual statistics of the world. High at the training-distribution sweet spot, low at the edges and out-of-distribution.
- E (internal coherence): how consistent the model's outputs are across rephrasings, framings, and reasoning chains. The "self-consistency" benchmarks measure exactly this.
- σ (contextual noise): how much of the input prompt is irrelevant, contradictory, adversarial, or prompt-injection garbage.
- Λ (contextual data quality): how rich and clean the in-context information is — retrieval-augmented context, well-formed tool outputs, structured grounding.
Now apply the multiplicative rule: an LLM with great F (well trained) and great Λ (great RAG) can still produce nonsense if E is low (it contradicts itself across the chain of thought) or σ is high (the prompt is full of distracting instructions). That is exactly what is observed in practice. This is not folk wisdom — it is the structural prediction.
Why multi-agent systems sometimes help and sometimes don't
When you compose N agents into a system, what happens to the joint B? Naively you might hope for averaging — bad agents and good agents cancel out, and the ensemble is better than any individual. The multi-agent coherence paper shows the actual rule:
The joint coherence of a multi-agent system is not the average of individual coherences. It is bounded above by the coherence of the protocol that connects them, and bounded below by the lowest E in the network.
In other words: a chain of brilliant agents communicating through a broken protocol is dumber than any single agent. A chain of mediocre agents communicating through an excellent protocol can outperform any single brilliant one. The protocol — the way they exchange state and resolve disagreement — is the variable that dominates.
This is exactly why "just spawn five sub-agents" so often fails to improve over a single well-prompted agent. The five sub-agents share their outputs through a low-Λ, high-σ channel (raw text in a planner loop), and the joint E collapses below the individual E.
Three design patterns that follow
- Coherence-aware termination. Stop a multi-agent loop not when N steps have passed, but when the joint B stops increasing. The collective observer paper gives a measurement procedure.
- Structured handoffs. Agents should communicate through schemas (high Λ, low σ), not free text. Every conversion from structured to unstructured is an entropy pump that lowers Λ and raises σ on the next agent.
- Diversity sized by E. If agents have very high individual E (highly consistent worldviews), adding more identical ones just amplifies the same blind spots — the joint E plateaus. If agents have lower individual E but different kinds of weakness, the composition can be genuinely additive. Coherence diversity is the relevant kind, not output diversity.
Shared hallucination as a coherence pathology
When a multi-agent system goes into shared hallucination — all agents confidently agreeing on something false — the diagnosis in ODTOE terms is: F has collapsed (no grounded check on reality), E has saturated (all agents in lockstep), σ is low (no noise to perturb the consensus), and Λ is high but corrupted (the in-context data has accumulated the hallucination). Notice that three of the four components look healthy. This is why these failures are so hard to detect by ordinary metrics — and why a coherence-aware monitor that specifically catches F-collapse with E-saturation is the right alarm.
The AI 3-6-9 → AGI paper develops the broader picture: AGI in ODTOE terms is not "smarter LLM" but "an observer whose B(O, C) remains high across novel C." That is a different optimization target, and the difference matters.
What this gives the practitioner
If you build multi-agent systems in 2026, the ODTOE summary is:
- Measure all four components of B per agent and for the joint system.
- Optimize the protocol (Λ, σ), not just the agents (F, E).
- Watch for E-saturation with F-collapse — that is your shared-hallucination siren.
- Pick diversity along E, not just along output.
Cite this post
Pankratov, A. (2026). How AI Coherence Mirrors Observer Coherence: Multi-Agent Systems in ODTOE. ODTOE Blog. https://odtoe.org/blog/ai-coherence-mirrors-observer-coherence-multi-agent-odtoe