Thesis. Tesla's "3, 6, 9" was famously cryptic; ODTOE makes it concrete in the context of multi-agent AI. A flat crew of N identical agents has a flat coherence ceiling because the joint E saturates fast. A structured 3-6-9 hierarchy — 3 strategic agents, 6 tactical agents, 9 operational agents, with closure protocols between layers — has structurally higher joint B because each layer has a different role-coherence profile. Recent benchmarks (2025–2026) confirm this prediction.
Tesla, briefly, without mysticism
The "3, 6, 9" pattern Tesla obsessed over is, in ODTOE's reading, a closure-rich hierarchy. Three is the minimum number of distinct levels needed to form a stable strange-loop closure (high-level, mid-level, low-level, with feedback from low to high). Six is the natural pairing of those three with their feedback partners. Nine is three sub-levels per level, the next closure-rich tier.
The 3-6-9 / Tesla paper develops the structural account. The numerical claim is not that 3, 6, 9 are magic numbers — they are minimum-closure numbers for a hierarchy that maintains its π-topology.
Multi-agent AI before ODTOE
Most current multi-agent frameworks (the popular crewAI, AutoGen, BabyAGI families) default to flat structures: a small pool of agents differentiated by prompts, communicating in a planner-loop. ODTOE diagnoses this as low-d (low observer dimensionality) and low-Λ (low data-quality of inter-agent channel). The joint B saturates fast, and adding more agents stops helping past ~5–7.
The multi-agent coherence paper makes this quantitative: flat crews of identical agents have a closed-form upper bound on joint B that depends on the protocol's Λ and σ, not on the agent count beyond a small threshold.
The 3-6-9 structure
Apply the ODTOE diagnosis:
- Layer 1 (3 agents): strategic. High weight on E (internal coherence) and F (fidelity to ground truth). Low weight on Λ — they should not be drowning in data. Their role is direction and closure.
- Layer 2 (6 agents): tactical. Balanced weights. Each tactical agent is paired with one strategic agent. The pairing is the explicit π-closure: tactical agents periodically re-check their plan against the strategic agent's mandate.
- Layer 3 (9 agents): operational. High weight on Λ (data-richness) and (1−σ) (clean execution). Three operational agents per tactical agent. Each operational triplet has a sub-π-closure: they cross-check execution among themselves before reporting up.
The structure is fractal: 3 × 2 = 6, 6 × 1.5 = 9, with closure relations at every level. The total agent count (3 + 6 + 9 = 18) is a small number — comparable to a flat crew — but the joint B is structurally higher because the protocol (the way information flows up and closure flows down) is high-Λ and low-σ.
Three benchmarks that already show this
The AI 3-6-9 → AGI paper compiles the empirical record:
- On long-horizon software tasks (SWE-Bench-Verified, 2025), a 3-6-9 framework scored 14 percentage points above the best flat-crew baseline at equal compute budget.
- On open-ended research benchmarks (GAIA, 2025–2026), the same structural advantage held, with the gap widening on tasks requiring genuine cross-level coherence.
- On adversarial robustness (prompt injection benchmarks), the 3-6-9 closure protocols caught injected hallucinations at the tactical layer in ~85% of cases, vs. ~35% for flat crews. This is exactly what the ODTOE prediction said: the closure layer catches F-collapse before it propagates.
What still gets debated
The optimal weights on E vs. F vs. Λ vs. σ within each layer are still being tuned. The right "personality" for strategic vs. tactical vs. operational agents is still empirical work. Some researchers argue for 4-7-10 or 5-7-11 schemes; the collective observer paper argues these are equivalently π-closed at slightly different cost curves.
The structural claim — that you need an explicit closure topology, not just a flat pool — is now the consensus.
What this means for practitioners
If you are building a multi-agent product in 2026:
- Pick a closure-rich topology, not a flat crew. 3-6-9 is a good default; smaller projects can use 2-4-6 with similar closure relations.
- Spend most of your engineering on the protocol, not the agents. The protocol is the multiplier; the agents are the multiplicands.
- Measure joint B, not per-agent benchmarks. Per-agent benchmarks tell you about F and Λ at the unit level. Joint B tells you whether the system works, which is what you ship.
- Build in mandatory closure cycles. Strategic ↔ tactical ↔ operational closure on a regular schedule. This is the π-loop. Without it, your system is an inverted pyramid (see the earlier post on inverted pyramids).
Cite this post
Pankratov, A. (2026). Multi-Agent AI Through ODTOE: Why 3-6-9 Roles Outperform Flat Crews. ODTOE Blog. https://odtoe.org/blog/multi-agent-ai-369-roles-outperform-flat-crews