ADR-0005: BehaviorTree.CPP v4 XML + typed LLM tool palette for S2 planning, not LangGraph
- Status: Accepted
- Date: 2026-05-24 (retroactive — documents a Week-1 planning-layer decision; the reasoner code itself lands later)
- Amended: 2026-05-24 (see Amendments below)
Context
CLAUDE.md §6.2 mandates a dual-system pattern for every robot agent: S1 (fast policy, 30–200 Hz, action-chunked VLA) and S2 (slow reasoner, 5–10 Hz, planner). The reasoner has to produce something that:
- The S1 skill executor can consume without a Python-only runtime (BT leaves call into ROS 2 actions; the BT executor itself is C++ for determinism and hot-reload).
- Is inspectable — a tree in the trace, not a closure graph in memory.
- Is hot-reloadable — when the failure detector fires, the replanning ladder needs to swap a subtree without restarting the process.
- Has a bounded vocabulary — every leaf is a registered Skill, and the LLM cannot hallucinate a leaf that isn't installed and capable.
The candidate planning substrates in 2026:
| Substrate | XML/IR format | C++ executor | Hot-reload | LLM-native | Replanning ladder |
|---|---|---|---|---|---|
| BehaviorTree.CPP v4 | XML (BT v4 grammar) | Yes (BT::Tree) |
Yes (registerNodeXML) |
No — needs glue | Subtree-level halt + reload |
| LangGraph | Python StateGraph |
No | No (Python-resident) | Yes (built for LLM agents) | Needs custom routing |
| LangChain "Agents" | Python | No | No | Yes | None — recursion control via the LLM |
| AutoGen / CrewAI | Python | No | No | Yes | None |
| pddl-stream / TAMP | PDDL+ | C++ available | No | No — symbolic | Native (replanning is the loop) |
The literature (Helix, π0.6, Gemini Robotics, GR00T N2) converges on
BTs for the executor, with the LLM as the author of the BT
XML. Two patterns are common: (a) prompt the LLM to emit BT XML
directly; (b) prompt the LLM to emit a typed Plan object that the
reasoner converts to BT XML deterministically.
The C++ BT executor is mandatory regardless of the choice in this ADR — ros2_control + lifecycle-node ergonomics + real-time guarantees rule out a Python executor for S2. The question is what the LLM emits.
Decision
The S2 planning layer is typed-LLM-output → deterministic Plan construction → BT.CPP v4 XML emission:
- The LLM emits a Pydantic
Planobject, not raw BT XML. LLMClientis a typed Protocol (OpenAI, Anthropic providers behind one interface).- The structured-output schema is
Planfromopenral_reasoner.plan(a Pydantic v2 model — per ADR-0003). - The provider's structured-output mode enforces the schema at the
wire level (OpenAI
response_format=Plan, Anthropic tools). - The reasoner converts
Plan→ BT XML deterministically. No LLM hallucination of the executable artifact; the XML is generated by code from a validatedPlan. - The leaf vocabulary is the local skill registry.
~/.config/openral/skills.tomllists installed rSkills + theirRSkillManifest.capabilities.- The LLM tool palette is generated from this list at planning
time. A leaf the LLM tries to emit that isn't installed-and-
capable raises
ROSPlanningError.ROSReasonerInvalidPlan(seeopenral_core.exceptions). - The C++ executor is BT.CPP v4 (
packages/openral_skill/bt_runner.cpp, planned). Leaves areRosActionNodes callingExecuteRskill.actionon the corresponding skill lifecycle node. Hot-reload onFailureTrigger. - Replanning ladder is BT-native. Retry decorators handle local retry; subtree halt + LLM re-invocation handles param-tweak and substitute-skill. Goal-replan re-prompts the LLM with the failure context. Human-handoff is a leaf that publishes a UI event and blocks.
- No LangGraph, no LangChain Agent, no AutoGen. These tools live
one layer above where we need; their value is orchestrating LLM
calls, not orchestrating robot skills under real-time and safety
constraints. They can be used inside an
LLMClientimplementation (e.g., as a routing layer in front of multiple providers) but not as the S2 substrate itself.
Consequences
- Pros
- The executable artifact (BT XML) is deterministic given the
Plan— replays are bit-identical from the trace, satisfying CLAUDE.md §8 (reproducibility over speed). - The C++ executor gives us real-time guarantees and matches the ros2_control / BT.CPP idiom the rest of the robotics ecosystem already uses.
- The typed
Planis testable in isolation without any LLM —NullReasonerreturns a hand-builtPlanso plumbing tests don't need API credentials. - The leaf-vocabulary closure (LLM can only emit installed skills) eliminates a class of hallucination failures up-front.
-
Switching LLM providers is a one-line
LLMClientswap, not a framework migration. -
Cons
- We own the
Planschema + the deterministic XML emitter. Both are small (~200 LoC each) but they're not bought. - We do not get LangGraph's free
StateGraphvisualisation; the BT XML is the visualisation, but tooling for "render this tree as a diagram" is on us. - When a multi-step LLM dialogue is genuinely needed (e.g., the
LLM asks the user a clarifying question), we implement it inside
LLMClient— we don't get LangChain's chat-loop helpers for free. - The C++ BT executor is more work than a Python
StateGraphrunner. Mitigated by the existence of BT.CPP v4 as a mature upstream library.
Alternatives considered
- Pure LangGraph, Python S2. Rejected — Python S2 can't hit the determinism and real-time bar for replanning under failure. Also collapses the S1/S2 separation: a Python S2 calling Python S1 has no enforcement of the cadence boundary CLAUDE.md §6.2 declares.
- LLM emits BT XML directly. Rejected — string-mode XML generation
is the kind of hallucination surface the structured-output discipline
exists to avoid. The typed
Planlets the provider's structured-output mode catch errors at the wire. - PDDL + a symbolic planner. Considered. Strengths: provable optimality, no hallucination. Weaknesses: writing PDDL domains by hand for every skill set is more work than the LLM tool-palette approach; LLMs are demonstrably good at picking the next skill from a typed menu but currently bad at writing well-formed PDDL.
- No S2 at all (S1-only, like SmolVLA stand-alone). Adequate for single-skill demos; insufficient for multi-step manipulation, which is the v0.2+ target. Documented as out-of-scope in CLAUDE.md §6.2.
Why this ADR is retroactive
The decision is encoded in CLAUDE.md §6.2, §7.6, in the explicit
non-goals (CANCELLED block in docs/architecture/repo-state-map.html),
and in the roadmap's Week-4 "Reasoner stub (LLM → BT)" entry. The
reasoner code itself is the next layer scaffold to land
(python/reasoner/ skeleton + Protocols). This ADR records the
substrate choice so that scaffold doesn't have to re-litigate it.
References
- CLAUDE.md §6.2 (dual-system pattern), §7.6 (working with the planner),
§10 (exception hierarchy —
ROSPlanningError). docs/roadmap/index.md— Week-4 Reasoner stub line.docs/architecture/repo-state-map.html— Layer 4 (Reasoning), with the BT-not-LangGraph stance encoded in CANCELLED.- BehaviorTree.CPP v4 — https://www.behaviortree.dev/.
- ADR-0003 — Pydantic v2 substrate the
Planschema rides on.