ADR-0031: Explicit simulation / real HAL separation with deterministic command routing
- Status: Proposed
- Date: 2026-06-01
- Related: ADR-0023 (
MujocoArmHAL.from_description); ADR-0029 (one robot.yaml-driven lifecycle node); ADR-0025 (panda_mobile lifecycle node,SimAttachedHAL); CLAUDE.md §1.3 (types are the contract), §1.4 (explicit beats implicit), §3 (HAL is layer 0).
Context
A robot's simulation HAL and real-hardware HAL are not declared consistently, and the choice of HAL type leaks out of the manifest into environment config and runtime parameters.
-
RobotDescription.sdk_entryis overloaded. A singlestr | Nonefield names a sim HAL for some robots (so100/so101→SO100DigitalTwin,rizon4→Rizon4MujocoHAL) and a real HAL for others (franka→FrankaPandaRealHAL,ur5e→UR5eRealHAL,aloha→AlohaHAL,sawyer→SawyerRealHAL). It is never imported at runtime today — it is dead metadata — and forso100/so101it points atSO100DigitalTwin, which is not even a HAL (it is a lerobotRobotdevice that plugs intoSO100FollowerHAL). -
deploy simdecides sim-vs-real at runtime. The so100 lifecycle node branches on an injectedsim_robot_yamlparam; the panda_mobile node branches onsim_env_yaml. The CLI registry (_ROBOT_HAL_REGISTRY) carries per-robotsupports_sim_env_yaml/supports_sim_robot_yamlflags that inject those params. -
deploy rundecides sim-vs-real from env config. The in-processHardwareRunner(so100-only) selects a digital twin vs real serial from ahal.transport.digital_twinboolean in theRobotEnvironmentYAML.
The net effect: the same command can boot a different HAL class depending on YAML, violating "types are the contract" and "explicit beats implicit".
Decision
-
Schema. Replace
RobotDescription.sdk_entrywith a structured submodelhal: HalEntrypoints { sim: str | None, real: str | None }.sdk_kind(license posture) is orthogonal and kept. Each field is a"module:Class"import string orNone. -
Sim derivation. When
hal.sim is Noneand asim:block is present, the sim HAL isMujocoArmHAL.from_description(description)(ADR-0023). So every plain arm leaveshal.simnull and the derivation provides the twin; only non-generic sim HALs (e.g.panda_mobile→PandaMobileHAL, which has nosim:block) namehal.simexplicitly. -
One resolver.
openral_hal.build_hal(description, *, mode: Literal["sim","real"], transport=None) -> HALis the sole HAL-construction seam. The ROS lifecycle nodes, thedeploy runrunner factory, anddeploy simall route through it. -
Deterministic command → mode.
openral deploy sim⇒mode="sim".openral deploy run⇒mode="real"— and only works against connected hardware (the real HAL'sconnect()fails otherwise). No env-config flag selects HAL type.-
openral sim run/openral benchmark runare unchanged: scene backends own the robot, no HAL is constructed. -
Missing HAL is a typed error.
build_hal(mode)raisesROSCapabilityMismatchwhen a robot lacks the HAL for the requested mode (e.g.sawyerfor sim,rizon4for real). A robot may legitimately declare neither HAL (scene-only robots:gr1,widowx,google_robot,pusht_2d); the error fires only whenbuild_halis called for the missing mode. -
The "sim device behind a real HAL" (
SO100DigitalTwininSO100FollowerHAL) is not a HAL type — it is a fake transport. Becausedeploy runnow requires real hardware, that path is removed fromdeploy run; its no-hardware CI coverage migrates todeploy sim(or a clearly-labelled test-only twin harness), never an env-config branch.
Alternatives considered
- Two flat fields
sim_hal_entry+real_hal_entry— rejected; the submodel groups them and extends to a futuremock/replaymode. - Keep
transport.digital_twin, gate by command — rejected; leaves the type-deciding flag in env config, the exact entanglement being removed. - Make real HALs
from_description-constructible like sim HALs — rejected; real HALs legitimately take transport-specific kwargs (robot_ip,fci_ip, serialport) and embed their own safety-envelope constants. The resolver bridges the two conventions instead.
Consequences
deploy runagainst a digital twin is no longer possible (usedeploy sim).- Sim-only (
rizon4/openarm/g1/h1/panda_mobile) and real-only (sawyer) robots are enforced byROSCapabilityMismatch; several real HALs remain skeletons (Franka/UR M3) and will raise "real HAL not implemented" until their driver lands — the honest state. - Adding a new robot needs only a manifest declaring both HALs — no per-command routing code.
- Layer touch: layer 0 (HAL) + the
openral_coreschema.schema_versionstays"0.1"(no migrators, CLAUDE.md §6); everyrobots/*/robot.yamlis updated in the same change.
Roadmap (out of scope here — separate ADRs/PRs)
deploy run→ ROS graph: convergedeploy runonto thedeploy simlaunch graph (kernel / sensors / dashboard / nav) withhal_mode:=real; retire the in-processHardwareRunnerdeploy path. Subsumes ADR-0029's unification.sim runHAL-driven native harness: run native MuJoCo scenes over the sim HAL (no ROS) for fast robot + rSkill iteration; generalizeSimAttachedHALso a scene takes arobot_idand attachesbuild_hal(mode="sim"). Benchmark/external scenes stay scene-backend.- Config reorg:
examples/sim→scenes/,examples/robot→deployments/.