ADR-0034: Deploy-sim scene-attach and sim-sensor bridge for manifest-driven arms
- Status: Accepted
- Date: 2026-06-03
- Related: ADR-0029 (unified HAL lifecycle node);
ADR-0031 (
build_halsim/real seam); ADR-0032 (deploy-run ROS graph); ADR-0033 (robot-parameterised native scenes). Supersedes the ADR-0033 §Decision-4 parenthetical (corrects an overstatement).
Context
openral deploy sim --config <scene>.yaml for manifest-driven arms (franka_panda, ur5e, ur10e,
aloha, g1, h1, rizon4) builds a headless bare-arm digital twin and publishes only
/joint_states. No MuJoCo window opens and no cameras render, so:
- Camera-conditioned rSkills (MolmoAct2, π0.5, …) abort with
ROSConfigError: got no camera frames; expected ('agentview','wrist'), saw []. - The two
deploy simpaths — puresim run(which drives aSimRolloutdirectly) anddeploy sim(which wraps aSimRolloutin a ROS lifecycle node) — produce different sensor output for the same scene YAML.
ADR-0033 §Decision-4 contained an overstated parenthetical —
"deploy sim already wraps scenes behind SimAttachedHAL for the ROS path — that stays"
— which was only true for the bespoke panda_mobile package. Every other manifest-driven
arm used the generic _ManifestHALLifecycleNode (ADR-0029), which called
build_hal(mode="sim") and received the bare twin. The ADR-0033 verification (tabletop_push
scenes for SO-101/Franka/UR5e) ran through sim run, not deploy sim.
Decision
Generalise scene-attach and sensor publishing to all manifest-driven arms via four
coordinated changes, preserving the ADR-0031 build_hal seam:
1. build_hal gains a sim_env_yaml parameter — openral_hal/resolver.py
def build_hal(
description: RobotDescription,
*,
mode: Literal["sim", "real"],
transport: dict[str, object] | None = None,
sim_env_yaml: str | None = None,
) -> HAL
mode="sim" + sim_env_yaml set → calls build_sim_env_from_yaml(sim_env_yaml,
robot_id_fallback=description.name), wraps the result in
SimAttachedHAL(env, description, env_reset_seed=seed), and returns it. The bare-twin /
hal.sim class is bypassed entirely — the scene owns physics and pixels. mode="real" +
sim_env_yaml → ROSConfigError (a real-hardware HAL never attaches a sim scene). The
HAL type is still decided in one place (ADR-0031 seam preserved).
2. Shared SimSensorBridge — openral_hal/sim_sensor_bridge.py
A stateful helper (rclpy imported lazily) that owns all manifest-gated sim sensor publishers and the viewer:
class SimSensorBridge:
def __init__(
self,
node: Any,
hal: Any,
description: RobotDescription,
*,
viewer_enabled: bool = True,
camera_rate_hz: float = 10.0,
viewer_sync_rate_hz: float = 30.0,
) -> None: ...
def setup(self) -> None # called from on_activate_post_subs
def teardown(self) -> None # called from on_deactivate / on_cleanup
setup() wires two streams, each gated on manifest + HAL capability:
| Stream | Topic | Gate | Producer |
|---|---|---|---|
| RGB | /openral/cameras/<n>/image |
hasattr(hal,"read_images") + RGB SensorSpec in manifest |
_publish_images |
| Viewer | — | viewer_enabled + mujoco_handles() not None |
mujoco.viewer.launch_passive; headless → warn + continue |
Phase 2 (ADR-0034 §Safety posture) adds /scan + depth PointCloud2 streams.
3. _ManifestHALLifecycleNode adopts the bridge — openral_hal/lifecycle.py
The node declares sim_env_yaml (default ""), viewer_enabled (default True), and
camera_publish_rate_hz (default 10.0) ROS parameters. _create_hal passes sim_env_yaml
to build_hal. on_activate_post_subs calls SimSensorBridge.setup(); the deactivate and
cleanup hooks call teardown(). Every manifest-driven arm gains scene + cameras + viewer
under deploy sim with zero per-package wiring (honoring ADR-0029).
4. CLI injects the scene path — openral_cli/deploy_sim.py
In resolve_launch_invocation, the manifest_driven branch, when hal_mode == "sim" and a
--config path is present: hal_params.setdefault("sim_env_yaml", str(config.resolve())).
The scene path is already resolved to derive robot_id; this step forwards it to the node.
5. Consumer-side camera-slot realignment — openral_rskill_ros/rskill_runner_node.py
The bridge publishes each frame on /openral/cameras/<sensor.name>/image, so the WorldState
node keys WorldState.image_frames by the manifest sensor name (agentview, wrist).
VLA adapters, however, look up obs["images"] by the VLA slot (camera1, camera2, …)
— the LIBERO convention openral sim run and the rldx adapter already use, and the key the
checkpoint's cam_alias maps (camera1 → image). A manifest whose RGB sensors are
descriptively named (franka) therefore handed pi0.5 obs["images"]["agentview"] while it
looked up camera1 → no frames → got no camera frames abort.
rskill_runner_node realigns the two namespaces from a single source of truth — the manifest's
vla_feature_key suffix, mirroring the bridge's _obs_key_for_sensor (§2):
_sensor_name_to_vla_slot(description)maps each RGBSensorSpec.name→ its slot (observation.images.camera1→camera1); sensors with novla_feature_keyfall back to their own name (robocasa real-name keys)._build_runtime_skill_from_manifestderives the adapter'sscene_camerasfrom_vla_camera_slots(description)(the slots, in manifest order) instead of the sensor-namecamera_namesruntime_nodeforwards — soresolve_camera_keys → _camera_keyslands on the slots. Falls back to the passedscene_cameraswhen the manifest declares no RGB sensors._PolicyAdapterSkill._step_implrekeysobs["images"]through_decode_image_frames(sensor name → slot) so the decoded frames match the adapter's_camera_keys.
The realignment is keyed off description.sensors vla_feature_key everywhere, so the obs
keys and the adapter's camera keys agree by construction. It only affects the deploy sim
(ROS) path; openral sim run already receives camera1/camera2 directly from the LIBERO env.
A Layer-3 skill package must not import the Layer-0 HAL (CLAUDE.md §3), so the slot-resolution
rule is duplicated (3 lines) rather than shared with sim_sensor_bridge.
Joint-name unification — openral_hal/sim_attached.py §3.6
SimAttachedHAL.read_state resolves joint names from the scene's MJCF by name
(mj_name2id), unlike the bare MujocoArmHAL which is index-based and name-agnostic.
MJCF joint naming is heterogeneous across backends:
- Native MJCF (franka
panda.xml): joints arejoint1..7+finger_joint1. - Robosuite scenes: the same arm's joints are prefixed
robot0_joint1..7. - Canonical manifest
name:panda_joint1..7(the safety-envelope / real-HAL contract).
Resolution (implemented in normalized_joint_index + manifest sim_joint_name):
sim_joint_namein the manifest carries the robot's native MJCF joint name wherever it differs from the canonicalname(franka:joint1..7+finger_joint1; so100/so101:Rotation…Jaw). Robots whose canonical names already match the MJCF (ur5e, ur10e, rizon4, g1, h1) need none. The canonicalnameis never changed.normalized_joint_indexbuilds a lookup that maps both the exact MJCF joint name and a robosuite-prefix-stripped form (^[a-z]+[0-9]+_strip:robot0_joint1→joint1) to the MuJoCo joint index. Exact names always win; stripped names are added only when they neither shadow an exact name nor produce an ambiguous collision (bimanualrobot0_/robot1_→ keep un-normalized, require explicitsim_joint_name).read_statetriessim_joint_name or namefirst; the normalized fallback catches robosuite prefixes. One manifest entry serves both native MjSpec and robosuite scenes.robot0_never appears in a manifest.
Safety posture (Phase 2)
Phase 2 of the bridge adds /scan lidar and depth PointCloud2 streams. The
PointCloud2 feeds octomap_server → the C++ safety kernel's capsule-vs-voxel check
(ADR-0030). panda_mobile now delegates to the shared bridge, lifting
synthesize_depth_pointcloud / robot_self_body_ids / camera_optical_tf_to_base
unchanged (the refactor is at-least-as-conservative — the ray-cast, self-body exclusion,
and point filtering are byte-identical to the pre-refactor in-node code).
Evidence + remaining gates:
- Regression test
packages/openral_hal_panda_mobile/test/test_sensor_bridge_regression.pybrings up the refactoredpanda_mobileagainstrobocasa/NavigateKitchenand asserts/scan(finite ranges),/openral/cameras/<depth>/points(non-empty — the octomap input must not collapse), and/odomall still publish. It fails loudly if the cloud is empty. - It is env-gated: robocasa needs
robosuite>=1.5.2; the workspace venv currently has1.5.1, so the test skips with that reason untiljust sync --group robocasa. The live run on a robocasa-provisioned host/CI is a required pre-merge gate. - Safety-WG reviewer approval + a hazard-log entry referencing this ADR are required before the Phase-2 commits merge (CLAUDE.md §3). NOT yet obtained — Phase 2 must not merge without them.
Alternatives considered
- Per-robot lifecycle node subclasses — each arm gets its own node that wires scene-attach. Rejected: that is exactly the ADR-0029 anti-pattern (per-package boilerplate).
- Scene-attach only on
sim run, notdeploy sim— rejected: both paths must produce identical sensor output for the same scene YAML (the spec's stated goal). build_halreturns the bare twin; lifecycle node post-processes it — rejected: the HAL type must be decided in one place (ADR-0031). Lifting theSimAttachedHALconstruction into the node would scatter the seam.
Consequences
- All scenes run identically on both
sim runanddeploy simpaths. - Manifest-driven arms (franka, ur5e, ur10e, aloha, g1, h1, rizon4) now receive scene +
camera publishing + MuJoCo viewer under
deploy simwith zero per-package changes. panda_mobilewill dedup onto the shared bridge in Phase 2; until then it retains its own per-robot sensor wiring.- ADR-0033 §Decision-4 parenthetical is corrected: scene-attach under
deploy simwas only true forpanda_mobile; it is now true for all manifest-driven arms. build_hal's signature gains one keyword-only parameter (sim_env_yaml). All callers that do not pass it are unaffected (Nonedefault, same behavior as before).- The
schema_versionstays"0.1"(no migrators; CLAUDE.md §6).
Amendment 2026-06-04 — sim-only free-running idle stepper
Problem
Under deploy sim the MuJoCo env lives only in the HAL node via SimAttachedHAL,
and env.step() is called only from SimAttachedHAL.send_action — reached only on
/openral/safe_action receipt, which only flows while a skill is executing. When
the deploy-sim graph is idle (no skill running), the env never steps: physics is frozen,
the rendered camera frames cached in _last_obs go stale, and the ADR-0035
perception / object-detector bus sees a dead scene (the detector runs over a single
frozen frame forever, motion/occupancy never updates). The cameras only "came alive"
once a skill happened to start stepping the env.
Decision
Add a sim-only free-running idle stepper that advances the env one tick with a zero/HOLD action whenever the scene is idle, so cameras keep rendering and the perception bus sees a live scene with no skill running.
SimAttachedHAL.idle_step() -> boolsteps the wrappedSimRolloutwithnp.zeros(env_action_dim, dtype=np.float32)— exactly the proven zero-action idiomrobocasa.refresh_obsalready uses — re-caching_last_obsfrom theStepResult. It mirrorssend_action's ADR-0036 deferred-reset branch (a terminated episodic backend is reset before stepping; robocasa'signore_donenever latches, so that branch is dead there) and re-latches_episode_done. It does not touch the commanded-slot merge state (_last_env_action) or the latched base twist (_last_body_twist) — an idle HOLD is orthogonal to whatever a skill last commanded.SimSensorBridgecreates an idle-step timer at the existingcamera_rate_hz(default 10 Hz — step-then-publish stays matched to the camera publisher, no new rate param) with a quiet windowidle_hold_ms(default 200 ms). Its callback callsidle_step()only whenshould_idle_step(monotonic_ns(), hal.last_action_ns, idle_hold_ns)isTrue— i.e. no real action arrived within the hold window.send_actionstampslast_action_nsat its top (the single choke point both_on_safe_actionand_on_cmd_velreach), so an active skill always wins the env: the idle tick yields. The single-threaded rclpy executor guarantees the idle timer and_on_safe_actionnever run concurrently, so the timestamp check alone is a sufficient hand-off — no lock.
Real-hardware exclusion (3 layers; safety-critical)
A zero action is a HOLD for the sim's velocity / OSC-delta / robosuite composite controllers, but on a real absolute-position arm (Franka FCI, lerobot follower) a zero joint-target vector commands "drive every joint to 0 rad" — a violent motion. "Zero is harmless" is therefore false on real hardware and is explicitly NOT the guarantee. The guarantee is structural, in three layers:
Honest caveat — zero is not even a literal HOLD on every sim backend. It is a true
hold only for velocity / OSC-delta / robosuite composite controllers. A
position-controlled native backend (e.g. so101_box, whose step consumes
joint-position targets) reads a zero vector as "go toward 0 rad", not "stay put". That is
acceptable for the idle stepper — the goal is to keep the scene physically live so cameras
render, not to freeze the arm — but the idle pose on such backends is the zero-rad pose,
not the last commanded one. (Separately: those native backends now each expose
action_dim so _probe_env_action_dim resolves their true step width — see
"Probe-gap fix" below.)
- Primary — method-only-on-
SimAttachedHAL.idle_stepis defined only onSimAttachedHAL. Real HALs (FrankaPandaRealHAL, ros2_control bridges, lerobot followers) do not define it.SimSensorBridgegates the idle timer oncallable(getattr(hal, "idle_step", None)), so against a real HAL the timer is never created. This is the real guarantee. - Secondary —
hal_mode.SimAttachedHALis only ever constructed underbuild_hal(..., mode="sim", sim_env_yaml=...).build_halraisesROSConfigErrorwhensim_env_yamlis supplied withmode="real", so a sim scene can never attach to a real-hardware HAL. - Tertiary — live MuJoCo handles.
idle_stepreturnsFalse(and the bridge also gates) whenmujoco_handles()isNone— a non-MJCF backend is never stepped this way.
idle_step also honors the estop contract: it returns False while _estop_latched
is set (an estopped HAL freezes — that is the correct, safe behaviour).
Contention / hand-off rule
The idle stepper is a fallback ticker, never a co-driver. It steps only during the
quiet window between real actions (now - last_action_ns >= idle_hold_ms). The moment a
skill streams actions, send_action updates last_action_ns every tick and the idle
stepper yields for the whole burst. There is exactly one writer to env.step at a time.
Probe-gap fix (resolved)
Originally SimAttachedHAL._probe_env_action_dim fell back to 11 (the robosuite BASIC
composite width) for backends that didn't expose action_dim. A native backend whose
step required a different width (so101_box → 6, tabletop_push → actuator count,
openarm_tabletop_pnp → state_dim) then raised a width mismatch on the first env.step.
That gap hit send_action too, but idle_step made it fire autonomously on the bridge
timer (with last_action_ns == 0 an idle scene begins stepping immediately, before any
skill runs), gating only on mujoco_handles() (which those backends do expose).
Fix — single source of truth. Every native MuJoCo rollout now exposes an action_dim
property reporting its true step width (mirroring the robosuite/robocasa/LIBERO backends
that carry it natively), so _probe_env_action_dim resolves the authoritative width for
all backends. As a safety net, the probe no longer guesses: when it genuinely cannot
introspect a width and no env_action_dim override was supplied, it raises
ROSConfigError naming the backend at connect time — a loud boot-time failure beats a
wrong-width mid-run E-stop. SimSensorBridge._idle_step_tick keeps its catch-once-and-
disable guard around idle_step() as defence in depth (e.g. an override that disagrees
with the backend), but the probe gap itself can no longer turn into a per-tick crash-loop.
Tests
python/hal/tests/test_sim_attached_idle_step.py — (a) idle → idle_step() advances the
rendered frame (the frozen-scene regression), proven against the native-MuJoCo so101_box
backend (and the LIBERO twin where installed); (b) a terminated episode → idle_step does
reset-then-zero-step with the latch cleared (LIBERO); (c) estop latched → idle_step()
returns False and the frame is unchanged; (d) safety: a real HAL has no idle_step,
and build_hal(..., mode="real", sim_env_yaml=...) raises ROSConfigError; (e) the pure
should_idle_step predicate yields within the hold and engages after.
Consequences
deploy simcameras + the ADR-0035 perception bus stay live when idle; no behaviour change while a skill is active (the idle tick yields).SimSensorBridge.__init__gains one keyword-only parameter (idle_hold_ms, default 200 ms). No new ROS param is introduced — the default is used and the step rate reusescamera_rate_hz.- No real-hardware path is touched; the
schema_versionstays"0.1".
Amendment 2026-06-10 — backend-agnostic joint-state + idle-step (non-MuJoCo sims)
Problem
SimAttachedHAL was MuJoCo-coupled in two places that left a non-MuJoCo
SimRollout (the Isaac Sim sidecar of ADR-0045; also ManiSkill3 / SimplerEnv,
SAPIEN-backed) half-functional under openral deploy sim:
read_state()reads joint angles from the env's MJCFqposviamujoco_handles(). With no handle it returned all-zeros —/joint_states(and the world-state aggregator, dashboard, and the geometric collision checker that reads it) saw a frozen-at-zero arm.idle_step()andSimSensorBridge._setup_idle_stepperwere gated onmujoco_handles() is not None, so cameras only refreshed while a skill was stepping; an idle non-MuJoCo scene went stale.
Neither is Isaac-specific — every non-MuJoCo backend hit both.
Decision
Generalize both to source from the SimRollout, touching only the
non-MuJoCo path (the MJCF path is byte-for-byte unchanged):
read_state()— whenmujoco_handles()isNone, build theJointStatefromobs["joint_positions"](a 1-D vector indescription.jointsorder) when the backend provides it; pad/truncate to the manifest joint count; fall back to the prior all-zeros only when absent. The Isaac sidecar scenes emit it via a newIsaacSceneBase._joint_positions()hook (the Franka's 9 DOF mapped to the manifest's 8 joints — 7 arm + a mean-finger gripper).idle_step()+_setup_idle_stepper— drop themujoco_handles()gate. Idle-stepping is valid for any wrappedSimRollout; the method-only-on-SimAttachedHALexclusion (real HALs never defineidle_step) remains the real safety guarantee, and_idle_step_tick's catch-once-and-disable guard still contains a per-tick fault. A zero action is a HOLD for the sim's velocity / OSC-delta controllers.
Safety posture
No real-hardware path is touched and no safety check is weakened — the change
improves safety (real joint angles instead of zeros feed the collision
checker), and the idle stepper stays gated on the sim-only idle_step method.
schema_version stays "0.1".
Tests
tests/unit/test_sim_attached_non_mujoco.py — against a fake non-MuJoCo
SimRollout (+ the real franka_panda manifest, no GPU/ROS): read_state uses
obs["joint_positions"]; falls back to zeros without it; tolerates a
length-mismatched vector; idle_step() advances the env with no MuJoCo handle.
tests/sim/test_franka_isaac_deploy_hal.py asserts real (non-zero) joint values
live against the Isaac sidecar.
Deferred
Joint velocities still report zero for non-MuJoCo backends (positions only).
Driving a LIBERO camera-VLA via deploy needs action-contract alignment
(bowl_plate 7-D EE-delta vs the franka manifest's JOINT_POSITION; ADR-0036) —
the lift_cube deploy scene sidesteps it.