ADR-0040: Geometric collision checking for every control mode (not just joint-position)
- Status: Proposed
- Date: 2026-06-06
- Deciders: safety-WG (required), TSC
- Related: ADR-0030 (self/world/voxel
collision in the kernel — this ADR completes the non-position-mode phase it
explicitly deferred); ADR-0020 (the C++ kernel +
allocation-free
validate()contract); ADR-0018 §5 (theActionChunk//openral/candidate_actiontopic contract); ADR-0036 (OSC/Cartesian action contracts on the deploy path — the modes this ADR must cover); ADR-0024 (wrapped MoveIt/Nav2 planning-time checks this does not replace); CLAUDE.md §1.1 (safety beats helpfulness), §1.4 (explicit / reject-not-clamp), §1.5 (Python proposes, C++ disposes; hot path is C++ and bounded), §3 Layer 6 Safety (safety-WG review + hazard-log update + at-least-as-conservative tests).
Context
ADR-0030 added an allocation-free geometric collision check (self + world-primitive
+ world-voxel) to the C++ safety kernel: for each row of a candidate ActionChunk
the kernel runs forward kinematics, poses the per-link capsules, and rejects +
E-stops the chunk if any capsule penetrates another link, a world primitive, or an
occupied voxel within the configured margin.
That check, as shipped, runs only for ControlMode::kJointPosition chunks.
cpp/openral_safety_kernel/src/lifecycle_kernel.cpp gates the entire geometric
block on the control mode:
// Runs only for absolute joint-position chunks (the rows are full joint
// configs FK can place).
if (geom_enabled &&
view.control_mode == static_cast<std::uint8_t>(ControlMode::kJointPosition) &&
view.n_dof >= collision_required_dof_ && view.flat_data != nullptr) {
// ... self / world / voxel checks ...
}
ADR-0030 §"Forward kinematics" anticipated this limit:
JOINT_POSITION/JOINT_TRAJECTORYrows are joint configurations directly.CARTESIAN_*/*_TWISTmodes require seeding from the latest/joint_statesand integrating forward (a later phase that adds a kernel joint-state subscription).
That later phase never landed. The result is a silent safety hole that affects most rSkills, because joint-position is the least common actuation mode in the fleet:
| rSkill / source | Control mode it commands | Geometric collision today |
|---|---|---|
pi05 / pi0.5 robocasa policies |
JOINT_VELOCITY (OSC-lowered) |
none |
smolvla, act, dp/dp3 (delta-EE) |
CARTESIAN_DELTA |
none |
rldx, composite robocasa |
COMPOSITE_MODE (twist + delta + gripper) |
none |
| teleop / jog streams | CARTESIAN_TWIST / BODY_TWIST |
none |
| MoveIt-planned waypoints (ADR-0024) | JOINT_TRAJECTORY |
partial (planner-side; kernel only if rows are absolute joint configs) |
| direct joint goals | JOINT_POSITION |
✅ covered |
This was observed concretely: in openral deploy sim
(pi05_robocasa_pnp_nf4), with the world-voxel map correctly populated
(/openral/world_voxels ~985 occupied cells after the octomap clock fix, PR #268)
and world_voxel_enabled=true, the arm still drove into the kitchen counter
uncaught — every candidate chunk was JOINT_VELOCITY, so the geometric block
was skipped. A hand-injected JOINT_POSITION chunk at the same configuration was
rejected + E-stopped, confirming the only difference is the mode gate.
The fix belongs in the kernel and applies to every embodiment and every rSkill, so it is an ADR, not a patch.
Non-goals
- This ADR does not change the scalar envelope check, the world-model ingestion contract, or the fail-closed degradation policy of ADR-0030 — those are reused verbatim.
- It does not add collision geometry for the gripper/fingers. The collision model deliberately excludes the end-effector contact surfaces (so the gripper can reach targets); intentional gripper-surface contact remains out of scope and is governed by force/torque limits in the envelope, not geometry.
- It does not replace planning-time collision checking for wrapped MoveIt/Nav2 skills (ADR-0024); the kernel remains the last-line reactive gate.
Decision
Make the kernel's geometric collision check mode-complete: for every control mode an rSkill can emit, reconstruct the candidate future configuration(s) the chunk would produce, seeded from the latest measured joint state, and run the existing self/world/voxel checks over them. No control mode is silently exempt.
1. Kernel subscribes to the measured joint state
Add a double-buffered /joint_states (or world_state_fast) subscription to the
kernel (the subscription ADR-0030 deferred). It maintains the latest measured joint
configuration q_meas in a lock-free latest-wins slot, plus its stamp for a
freshness gate. q_meas is only read on the hot path; ingestion never allocates.
Freshness is fail-closed: if q_meas is older than collision_state_deadline_ms
when a mode that requires seeding arrives, the chunk is rejected as
state_unavailable (drop, no latch — motion resumes when a fresh state lands),
mirroring the existing world_unavailable / voxel_unavailable semantics.
2. Per-mode reconstruction of the checked configuration(s)
For each chunk the kernel derives the sequence of joint configurations to FK + check.
dt is the chunk period (1 / rate_hz); substeps S keep the swept check
continuous as in ADR-0030.
| Control mode | Configuration(s) checked | Seed |
|---|---|---|
JOINT_POSITION / JOINT_TRAJECTORY |
rows verbatim (current behavior) | none |
JOINT_VELOCITY |
q_k = q_meas + Σ_{i≤k} v_i · dt integrated over rows, with S substeps between configs |
q_meas |
JOINT_TORQUE |
conservative: check q_meas plus a bounded reachable shell q_meas ± v_max · dt (torque→config needs full dynamics; we do not integrate it on the hot path) |
q_meas |
CARTESIAN_POSE |
IK-free: FK q_meas, then check the swept tool/wrist capsule from the current EE pose to the commanded pose (capsule-cast), not a re-solved q |
q_meas + EE FK |
CARTESIAN_DELTA |
same as CARTESIAN_POSE with target = current EE pose ∘ delta |
q_meas + EE FK |
CARTESIAN_TWIST / BODY_TWIST |
swept capsule along twist · dt from the current EE / base pose |
q_meas (+ base pose from TF) |
COMPOSITE_MODE |
decompose into its sub-actions; apply this table to each and reject if any sub-check trips | per sub-action |
GRIPPER_* |
no arm-link motion → geometric check skipped (envelope force limits apply) | — |
The Cartesian/twist rows are checked with a capsule-cast (swept EE/wrist capsule between the seeded pose and the predicted pose) rather than a re-solved IK configuration: it is allocation-free, requires no IK in the hot path, and is conservative (the cast volume is a superset of the true link path for the distal links that actually approach obstacles). Whole-arm Jacobian prediction is a possible refinement (Phasing) but is not required for the conservative guarantee.
3. Conservatism is the contract, not accuracy
Every per-mode reconstruction must be at least as conservative as reality: it may reject a chunk that would not have collided, but it must never pass a chunk that would. Where the reconstruction is uncertain (torque mode, IK-ambiguous Cartesian), the kernel widens the checked volume (reachable shell / capsule-cast) rather than narrowing it. This is the invariant the safety-WG and the test suite verify.
4. Degradation and observability (reused from ADR-0030, extended)
- Fail-closed default unchanged: no collision model, stale/over-capacity world,
stale measured state for a seed-requiring mode, or per-chunk budget overrun →
reject + E-stop (or
state_unavailabledrop without latch for the freshness gate). safety.checkspans gainsafety.collision_mode(the control mode that was reconstructed) so the dashboard and traces show which path rejected.- The
warn-only sim/bring-up mode (explicit param + loud log +severity=warn) extends to the new modes; it never disables the check and is never the default.
5. No mode left silently unchecked
The mode gate is inverted: instead of "check only if JOINT_POSITION", the kernel
checks every geometry-relevant mode and maintains an explicit allow-list of
modes that legitimately need no arm-geometry check (GRIPPER_*, FOOT_PLACEMENT
pending its own ADR). An unknown / unhandled mode with geometry enabled is
rejected (fail-closed), not passed.
Alternatives considered
- Require all rSkills to emit
JOINT_POSITION. Rejected: VLA policies emit velocity/delta chunks by construction (ADR-0036); forcing position output would require re-integrating in every adapter, duplicating logic and losing the policies' native action space. Safety must adapt to the fleet's modes, not the reverse. - Planning-time collision only (MoveIt/Nav2). Rejected: VLAs and teleop do not plan; the kernel is the only gate they cross (ADR-0030 §Context).
- Check only the current measured configuration each tick (reactive, no seeding). Partially adopted as the torque fallback, but rejected as the general design: it catches a collision one tick after the command that caused it (the arm is already penetrating), whereas seeding + integration is predictive and rejects the colliding command before it is applied. The predictive form is strictly more conservative for non-zero-latency actuation.
- Full IK per Cartesian row. Rejected for the hot path: IK is iterative, allocates, and is ambiguous (multiple solutions). The conservative swept capsule-cast gives the safety guarantee without IK.
Consequences
- Every rSkill gains world/self collision coverage, not just joint-position ones — closing the hole that let the pi05 robocasa arm hit the table.
- The kernel gains a
/joint_statessubscription and per-mode reconstruction code; both are bounded and allocation-free per ADR-0020. The reconstruction adds a small, bounded cost to the per-chunk budget (measured against the deadline in the HIL/sim suite). - Safety-WG gating (CLAUDE.md §3): this touches
cpp/openral_safety_kernel/and changes safety behavior. It requires (a) a safety-WG reviewer, (b) a hazard-log entry (the velocity-mode-uncaught-collision hazard this closes), and (c) tests proving the new behavior is at least as conservative as the position-mode path for the same geometry — including a regression that reproduces the deploy-sim table collision and asserts the kernel now rejects + E-stops the velocity chunk. - Tests: per-mode reconstruction unit tests (each mode vs a known-colliding and a
known-clear geometry), a property test that the reconstructed swept volume is a
superset of the integrated link path, and a sim test
(
tests/sim/test_panda_mobile_pi05_robocasa.py) asserting a counter approach is vetoed. - Latency: the swept/substep count is the tunable; the budget is enforced on the reference host as in ADR-0030.
Phasing
- Contracts + joint-state subscription — add the kernel
/joint_statesseed + freshness gate + thesafety.collision_modespan attribute; no behavior change yet (position mode still the only checked mode), so it is a safe, reviewable base. - Joint-velocity coverage — integrate
v·dtand run the existing checks; ship the deploy-sim regression. This alone closes the observed pi05 hole. - Cartesian / twist coverage — swept capsule-cast for
CARTESIAN_*/*_TWIST, andCOMPOSITE_MODEdecomposition. - Torque coverage + refinements — conservative reachable-shell for
JOINT_TORQUE; optional Jacobian whole-arm prediction for tighter (still conservative) Cartesian bounds.
Each phase is independently safety-WG-reviewed and ships its own tests; the fail-closed default means a not-yet-implemented mode is rejected, never silently passed, from Phase 1 onward.
Fleet control-mode audit (what is actually emitted today)
Before implementing, the in-tree rSkills + robots were audited for the
ControlMode they emit on /openral/candidate_action:
| ControlMode | Emitted by | Through kernel? | Build now? |
|---|---|---|---|
JOINT_POSITION |
act-aloha, pi05-so101, molmoact2-so101, smolvla-maniskill/metaworld, pi05-openarm, rldx1-gr1, ACT/diffusion legacy | yes | already (ADR-0030) |
CARTESIAN_DELTA (+GRIPPER_POSITION) |
12 rSkills — all LIBERO (act/smolvla/pi05/xvla/rldx), SIMPLER (widowx/google), DROID, + the robocasa arm chunk (rldx-rc365/robocasa, pi05-robocasa365) | yes | YES — Phase 3 (reactive floor + Jacobian predictive look-ahead) |
JOINT_VELOCITY |
robocasa base chunk (rldx-rc365/robocasa, pi05-robocasa365 slot 7:9) | yes | YES — Phase 2 |
COMPOSITE_MODE |
robocasa mux flag (same 3 skills, slot 11, n_dof=1 scalar) | yes | no geom — scalar flag, no arm geometry; passes the n_dof≥required gate; the arm is the companion CARTESIAN_DELTA chunk |
GRIPPER_POSITION/GRIPPER_BINARY |
LIBERO/robocasa gripper slot (n_dof=1) | yes | no geom — opening/closing does not move arm-link capsules |
BODY_TWIST |
nav2 cmd_vel bridge |
NO — bypasses the kernel (ADR-0024) | n/a |
JOINT_TORQUE, JOINT_TRAJECTORY, CARTESIAN_POSE, CARTESIAN_TWIST, FOOT_PLACEMENT, DEX_HAND_JOINT |
none | — | defer (unused) |
Verdict applied: (a) Cartesian — built, both the reactive floor and the Jacobian predictive look-ahead (largest footprint, and the actual robocasa-arm path that hit the table); (c) COMPOSITE_MODE — no work needed (scalar passthrough; arm covered by its CARTESIAN_DELTA companion); (d) JOINT_VELOCITY predictive-dt — moot for the only emitter (the robocasa base velocity dofs are zeroed by the base-relative frame fix, so integrating them is a no-op; reactive covers it); (b) JOINT_TORQUE shell — skipped (no robot/rSkill emits it).
Implementation status (this branch)
- Phase 1 (joint-state seed + fail-closed gate) — done.
- Phase 2 (joint-velocity: reactive measured config + predictive
v·dt, dt-gated) — done. - Phase 3 (Cartesian) — done, in two layers:
- Reactive measured-config check for
CARTESIAN_*/BODY_TWIST(rejects an arm already in/at an obstacle) — the guaranteed floor. - Predictive look-ahead for
CARTESIAN_DELTA(the robocasa arm + all LIBERO/SIMPLER/DROID arms): the kernel reconstructs where the chunk's EE deltas drive the whole arm via a damped-least-squares Jacobian (jacobian_dls_step,dq = Jᵀ(JJᵀ+λ²I)⁻¹·dx), integrates the trajectory forward, and runs the full capsule self/world/voxel check at each step. The last step is always verified; earlier steps up tocollision_predict_max_steps(0 = all). Per-step margin inflation (collision_predict_margin_growth_m) bounds the linearization/DLS residual. On a mobile base the base dofs are blocked from the arm Jacobian (the EE delta is realised by arm joints; the base is checked in its own frame). The EE control link is the deepest collision link (ee_link_index_from_collision_params);collision_ee_link_index<0disables predictive (reactive only). Safety invariant: predictive is purely additive early warning on top of the reactive floor — an imperfect Jacobian/frame reconstruction can only add rejections (conservative) or degrade to reactive (catch at contact), never make the kernel less safe. Allocation-free (fixed 6×6 DLS solve + stack Jacobian scratch, capacitykMaxJacobianDof); proven byNoAlloc.JacobianDlsStepIsAllocationFree. - Assumption:
CARTESIAN_DELTArows are EE twists in the base frame (robosuite OSC convention). A future non-baseframe_idwould fall back to reactive (the floor), not mis-predict unsafely. - Still deferred: predictive
CARTESIAN_TWIST/CARTESIAN_POSE(no in-tree emitter) stay reactive/fail-safe. - Mobile-base frame fix — the base-relative FK zeroes the manifest
base_jointsso a mobile manipulator's arm is checked in thebase_linkframe the voxel grid lives in (also repairs ADR-0030's position path for mobile bases). - Deferred (unused today):
JOINT_TORQUEreachable-shell, predictive Cartesian capsule-cast,CARTESIAN_TWIST. These remain fail-closed when a future rSkill emits them with the seed plumbed.
Hazard log
HZ-0040-1 — mobile-base dof zeroing silently disabled (dangling reference)
- Severity: high. On a mobile manipulator (
panda_mobile/robocasa) the base-relative FK frame fix was silently a no-op, so the arm capsules were placed at the base's world pose (metres outside the base_link-relative world/voxel grid) and no world/voxel collision could ever be caught — the exact "robot drove into the table and the kernel did not stop it" symptom. - Root cause:
on_configureiterated a range-basedfordirectly overget_parameter("collision_base_dofs").as_integer_array().as_integer_array()returns a reference into the temporaryrclcpp::Parameter; the range-based for does not lifetime-extend that temporary, so the loop read freed stack memory.collision_base_dofs_ended up empty (or garbage) depending on stack reuse — i.e. it appeared to work under some builds/logging and failed under others. (ASAN:stack-use-after-scope,lifecycle_kernel.cpp:246.) - Fix: bind the parameter array to a named local before iterating
(
const std::vector<int64_t> base_dofs = ...; for (d : base_dofs)). - Detection / regression: found via AddressSanitizer; locked down by
LifecycleKernelTest.MobileBaseArmCaughtAgainstVoxelWall— a deterministic, geometry-verified test that seeds the base far out in the world, requires the base dof to be zeroed for the arm capsule to land in an occupied voxel wall, and asserts e-stop. Verified deterministic (10/10 runs) and ASAN-clean. - Conservativeness: strictly more protective — the fix enables a check that was previously dead; no path is made less conservative.
HZ-0040-2 — predictive Cartesian reconstruction is an approximation
- Concern: the predictive
CARTESIAN_DELTAcheck reconstructs the future arm configuration from EE deltas using a damped-least-squares Jacobian (a local linearization) and assumes deltas are in the base frame. Both are approximations: DLS can undershoot near singularities; a wrong frame would predict the wrong trajectory. - Why it is safe anyway (design invariant): prediction is purely additive
on top of the reactive measured-config check, which always runs first and
e-stops if the current config collides. The predictive pass can therefore
only (a) add an earlier rejection, or (b) fail to predict and fall back to
the reactive floor (catch at contact). It can never suppress the reactive
check or pass a config the reactive check would reject. Per-step margin
inflation (
collision_predict_margin_growth_m) further biases prediction toward early rejection to bound the linearization/DLS residual. Worst case = reactive-only behavior, which is the pre-Phase-3 contract. - Mitigations in code: base dofs blocked from the arm Jacobian (mobile base);
EE-link mis-id bounded by the whole-arm capsule check;
collision_ee_link_index<0and non-CARTESIAN_DELTAmodes fall back to reactive; allocation-free (no hot-path surprises). - Tests:
CartesianDeltaPredictiveCatchesChunkDrivingEeIntoWall(start clear, predicted trajectory enters a wall → estop) andCartesianDeltaPredictivePassesWhenTrajectoryStaysClear(no false positive), plusJacobianDls.*unit coverage (twist realization, fail-safe on bad EE link, damping bound, blocked-dof, allocation-free).
Out of scope — base navigation is not kernel-protected
Mobile-base navigation via the rskill-nav2-navigate-to-pose rSkill does not
flow through this kernel at all, and the geometric collision check here does not
cover it:
- The skill is result-only (ADR-0024): it forwards a
nav2_msgs/NavigateToPosegoal and emits noActionChunkon/openral/candidate_action. - Nav2's own behaviour tree publishes
cmd_veldirectly to the base controller, bypassing the safety supervisor. Base collision avoidance therefore relies entirely on Nav2's costmap, not on our occupancy/voxel check. - No in-tree HAL advertises
body_twistinsupported_control_modestoday, so there is no velocity stream for the kernel to gate even if we wanted to.
Follow-up (tracked, unscheduled): when a mobile-base HAL exposes body_twist
under the supervisor, route the base velocity stream through the kernel and reuse
the same base-relative voxel check (the base capsules, not the arm, against the
local occupancy grid). Until then, the kernel's protection is the manipulator,
and the costmap's is the base — this split is intentional and documented so it
is not mistaken for full base-collision coverage.