ADR-0052: Cross-frame object-lift (RGB-camera optical TF + octomap/kernel decoupling)
- Status: Proposed
- Date: 2026-06-12
- Related: ADR-0030 (octomap world-voxel leg + kernel capsule-vs-voxel check); ADR-0035 (2-D detection → 3-D scene graph); ADR-0037 (detector rSkill); ADR-0043; ADR-0050.
Context
In deploy-sim robocasa the autonomous grab never fired: recall_object always returned empty,
because world_state.detected_objects stayed empty even though the detector published detections.
Three compounding causes:
- No world voxel map.
ObjectLifterprojects the occupied world voxels (/openral/world_voxels) into each detection camera and intersects with the 2-D box. The demo ran--no-enable-octomap(the ADR-0030 kernel capsule-vs-voxel check false-positives in the dense kitchen — arm starts ~3 mm inside a counter voxel → E-stop at step 1), so there were no voxels to lift against. - The RGB camera had no optical frame. The agentview cameras declared
frame_id: world(the global origin, not the camera pose) and no*_optical_frameTF was broadcast — only depth cameras got a livebase_link → <camera>_optical_frameTF. So even with voxels, the lifter could not place the box's camera. - The detection was mis-stamped. The launch hard-coded the detector's
sensor_id=front_depthwhile it ran onagentview_leftRGB — so the lifter resolved the wrong camera's intrinsics/extrinsics.
The voxel-grid lift is the real-hardware-correct model (one body-mounted depth sensor builds the map; RGB detections from separately-mounted cameras lift against it across frames). The fix is to make it actually work, generically.
Decision
- Broadcast a generic RGB-camera optical-frame TF.
SimSensorBridgebroadcastsbase_frame → <camera>_optical_framefor every RGB camera whoseframe_idis a dedicated*_optical_frame, from live MuJoCo poses (reusingcamera_optical_tf_to_base+mjcf_camera_name, the same mechanism depth cameras already use). Generic over all robots and camera names — the MJCF camera is read from eachSensorSpec.metadata.mjcf_camera. Cameras whoseframe_idis a robot link (e.g. an eye-in-hand atpanda_hand) are skipped — they already have TF fromrobot_state_publisherand must not be clobbered. - Per-robot camera config. Each liftable RGB camera in
robot.yamlgets a dedicated*_optical_frameframe_id+metadata.mjcf_camera(done for panda_mobile's agentview L/R). - Stamp the detection with its real camera. The launch derives the detection camera from the
robot's first liftable RGB camera (an
*_optical_frameRGB sensor) and sets bothimage_topicandsensor_idfrom it — generic, no hard-codedfront_depth/agentview_left. - Decouple octomap perception from the kernel check. New
--no-enable-octomap-kernel-check(launchenable_octomap_kernel_check, default True). With--enable-octomap --no-enable-octomap-kernel-check,/openral/world_voxelsis published for the object-lift while the kernel's capsule-vs-voxel check stays off.
The lift itself (ObjectLifter) is unchanged — it already does the cross-frame projection.
Safety (CLAUDE.md §3)
Decision 4 touches the kernel voxel-check gating, so it carries a safety-WG note + hazard-log
entry. It never weakens the kernel below the existing --no-enable-octomap baseline: when the
new flag is False the kernel posture is identical to --no-enable-octomap (envelope +
self-collision checks on, capsule-vs-voxel off) — it only adds a perception topic. Default True
preserves bundled ADR-0030 behaviour. No code path lets the flag re-enable a less-conservative
kernel.
Consequences
- Positive: the autonomous detect → recall → navigate → grab loop works on robocasa; generic over robots / camera names / depth names; reuses existing lift + TF mechanisms; real-HW correct (separate depth sensor + RGB cameras).
- Negative / costs: publishing world_voxels adds octomap_server + bridge load; the per-robot
robot.yamlcameras need an*_optical_frame+metadata.mjcf_camerato be liftable.
Testing
- Unit:
SimSensorBridgebroadcastsbase → <cam>_optical_frameonly for RGB cameras with an*_optical_frame(a link-framed eye-in-hand camera is skipped); generic over a fake 2-camera description. - Integration/sim: with
--enable-octomap --no-enable-octomap-kernel-check, the agentview optical TF resolves,/openral/world_voxelspublishes, and a detected object appears inworld_state.detected_objectssorecall_objectresolves (the deploy-sim robocasa repro). - Safety: a test pinning that
enable_octomap_kernel_check=Falseleaves the kernel'sworld_voxel_enabledFalse (no less conservative than--no-enable-octomap).