Layer 5–8 — Reasoning, WAM, Safety, Observability
Part of the OpenRAL public-symbol inventory. Hand-curated;
(LNN)markers are refreshed bytools/refresh_methods_linenos.py.
Layer 8 (Observability) is fully shipped — traces + metrics + structlog→OTLP log bridge, with W3C TraceContext propagation helpers for cross-process correlation (Python ↔ ROS 2 ↔ C++ safety kernel). Layers 5–7 ship their Protocol surface (landed 2026-05-18) so the rest of the runtime can compose against locked signatures; concrete implementations (LLM clients, generative WAM adapters, the C++ safety kernel) are still planned.
python/reasoner/src/openral_reasoner/protocol.py
S2 reasoner Protocols (ADR-0005).
class LLMClient(Protocol)— Wire-level Protocol for an LLM provider that supports structured output. Attribute:model_id. Method:complete_structured(prompt, schema) -> Plan. (L29)class Reasoner(Protocol)— Planning-layer Protocol every S2 reasoner satisfies. Attributes:plan_rate_hz,client: LLMClient | None. Method:plan(world_state, goal) -> Plan— emits a validatedPlanfor the goal given the currentWorldState; raisesROSReasonerInvalidPlan/ROSPlanningError. (L64)
python/reasoner/src/openral_reasoner/plan.py
Pydantic v2 structured-output schemas the LLM emits (ADR-0003 + ADR-0005).
class ToolCall(BaseModel)— One leaf of aPlan— a single skill invocation. Fields:rskill_id: str (min_length=1),params: dict[str, Any],rationale: str | None.extra="forbid". (L24)class Plan(BaseModel)— The structured LLM output the reasoner emits per planning tick. Fields:goal: str (min_length=1),tool_calls: list[ToolCall] (min_length=1),confidence: float ∈ [0.0, 1.0],bt_xml: str | None.extra="forbid". (L66)
python/reasoner/src/openral_reasoner/null_reasoner.py
No-LLM stub satisfying the Reasoner Protocol (for plumbing tests; not a production fallback).
class NullReasoner— Emits a single-leafPlancallingdefault_skill_idwith confidence 1.0. Attributes:plan_rate_hz,client = None,default_skill_id. (L30)__init__(default_skill_id="noop", *, plan_rate_hz=5.0) -> None— Initialise. (L63)plan(world_state, goal) -> Plan— Return a context-free single-leafPlan. (L74)
python/reasoner/src/openral_reasoner/tool_use.py
ADR-0018 F4 — typed LLM tool-use clients (direct-dispatch surface). CLAUDE.md §6.2 / §7.6 amended in the same PR. The direct typed ReasonerToolCall surface is the sole planner output.
- module constant
DEFAULT_SYSTEM_PROMPT: str— Factual system prompt for the S2 reasoner: one-tool-per-tick semantics, goal fidelity, robot/scene-matched skill selection, the ADR-0044 go-see-then-act ladder — recall (recall_object, honouring Phase-4a 'approach BLOCKED') → navigate-to-approach (resolve_place / Nav2) → aim (the camera-aiming/look-at skill) → verify (locate_in_view, live vs remembered) → manipulate, each rung gated on its tool/skill being in the palette — progress evaluation, observe-but-never-bypass safety/e-stop handling, and exact-field-name discipline. Concrete deployments may override. (L74) - module constant
OPENROUTER_BASE_URL: str—https://openrouter.ai/api/v1; pre-filled whenPROVIDER=openrouterso users don't have to memorise it. (L443) - module constant
SYSTEM_PROMPT_ENV_VAR: str = "OPENRAL_REASONER_SYSTEM_PROMPT"(L339) — env var that overrides the base operating brief; honoured byresolve_reasoner_system_prompt. render_robot_context_prompt(capabilities: RobotCapabilities | None, *, base_prompt=DEFAULT_SYSTEM_PROMPT) -> str(L228) — Option B: append a deterministic## THIS ROBOTbody-awareness block (embodiment tags, locomotion + navigate/no-navigate guidance, manipulation/sensing hardware, payload, control modes) to the system prompt.Nonereturnsbase_promptunchanged.resolve_reasoner_system_prompt(capabilities: RobotCapabilities | None, *, env=None) -> str(L342) — Compose the reasoner system prompt: base brief (OPENRAL_REASONER_SYSTEM_PROMPToverride if non-empty, elseDEFAULT_SYSTEM_PROMPT) + the## THIS ROBOTblock.envis injectable for tests. Called byReasonerNode.on_configure.class ToolUseClient(Protocol)(L389) — Attributemodel_id; methodselect_tool(*, context_text, palette, system_prompt=DEFAULT_SYSTEM_PROMPT) -> ReasonerToolCall. RaisesROSReasonerInvalidPlanon bad discriminator / palette mismatch,ROSPlanningErroron transport failure.build_tool_use_client_from_env() -> ToolUseClient(L461) — Factory readingOPENRAL_REASONER_LLM_{PROVIDER, MODEL, API_KEY, BASE_URL}. PROVIDER ∈ {anthropic,openai-compatible,openrouter}; theopenroutervalue is a shortcut on top ofOpenAICompatibleToolUseClientthat pre-fillsOPENROUTER_BASE_URL. No cloud lock-in: open-core has no default.class AnthropicToolUseClient(L922) — Anthropic SDK-backed client.__init__(*, model_id, api_key, max_tokens=1024, timeout_s=10.0). Lazy-importsanthropic.class OpenAICompatibleToolUseClient(L1001) — OpenAI SDK-backed client pointed at any OpenAI-protocol endpoint (cloud OpenAI, local vLLM, Ollama-OpenAI).__init__(*, model_id, api_key=None, base_url=None, timeout_s=10.0). Lazy-importsopenai._tool_palette_to_anthropic_tools(palette) -> list[dict]— Render the palette as AnthropictoolsJSON Schema fragments. ADR-0022: whenpalette.skillsis non-empty, emits oneexecute_rskill__<slug>tool per skill carrying a real NL description + the structured action/object/scene tags (therskill_idfield is dropped from each per-skillinput_schemabecause the tool name is the authority). Empty / legacy palettes (execute_rskill_idsonly) fall back to the original single-execute_skilltool with the ids in the description._decode_tool_payload(*, tool_name, arguments, palette) -> ReasonerToolCall— Validate provider output against the union + palette. ADR-0022: per-skill tool names (execute_rskill__<slug>) are resolved back to the canonicalexecute_skilldiscriminator viapalette.skillslookup, and the resolvedrskill_idoverrides anything the LLM provides.- module constant
_PER_SKILL_TOOL_PREFIX: str = "execute_rskill__"— prefix the decoder matches on to identify per-skill tool calls. (L561) - module constant
_LLM_TOOL_NAME_MAX_LEN: int = 64— Anthropic + OpenAI tool-name regex limit; long HF Hub ids are sha1-suffix-truncated to fit. (L564) _skill_id_to_tool_name(rskill_id: str) -> str— Slugify a<owner>/<repo>HF Hub id into a 64-char-max LLM tool name. Long ids get an 8-char sha1 suffix to stay unique post-truncation. (L567)_format_skill_tool_description(entry: RSkillToolEntry) -> str— Render the skill's id + description + actions + objects + scenes into the NL string the LLM scores. (L590)_drop_property(schema: dict, name: str) -> dict— Return a copy of a JSON Schema dict withnamestripped from bothpropertiesandrequired. Used to droprskill_idfrom per-skillExecuteRskillToolschemas. (L831)
python/reasoner/src/openral_reasoner/palette.py
ADR-0018 F4 / ADR-0022 — closed-set ToolPalette + builder. Three tool variants (reload_gst_pipeline / lifecycle_transition / emit_prompt) are always available; execute_skill is gated by the installed-rSkill registry filtered by RobotCapabilities + license posture. ADR-0022: palette carries per-skill metadata (RSkillToolEntry), not just opaque ids — the LLM gets one tool per skill with description + action verbs + object/scene tags.
class RSkillToolEntry(BaseModel)(L135) — Frozen per-skill record surfaced to the LLM as one tool. Fields:rskill_id: str,description: str,actions: tuple[RSkillAction, ...],objects: tuple[str, ...] = (),scenes: tuple[str, ...] = (). Mirrored from the matchingRSkillManifestfields at palette-build time.class ContinuousDetectorEntry(BaseModel)(L44) — ADR-0051. Frozen coverage record for amode: continuousdetector — surfaced to the LLM as coverage (not a tool) so it can read world state for tracked objects and reservelocate_in_viewfor the long tail. Fields:rskill_id: str,description: str,objects: tuple[str, ...] = (),scenes: tuple[str, ...] = (),num_labels: int = 0(compact characterisation, not the full label list).class OnDemandDetectorEntry(BaseModel)— ADR-0056. Frozen record for amode: on_demandopen-vocab locator surfaced as a selectablelocate_in_viewoption (a prompt-able read-only tool, never an ExecuteSkill policy). Fields:rskill_id: str,alias: str(short selector the LLM passes asLocateInViewTool.detector),description: str(capability hint).detector_alias(rskill_name) -> str— ADR-0056. Short LLM-/operator-facing detector id: strips theOpenRAL/org +rskill-kind prefixes ("OpenRAL/rskill-omdet-turbo-locator"→"omdet-turbo-locator"). Single source of truth for the alias the reasoner routes on.detector_service_segment(alias) -> str— ADR-0056. ROS-safe service-namespace segment for an alias (hyphens → underscores), so the locate service lives at/openral/perception/<segment>/locate_in_view.locate_in_view_service(detector, *, default="") -> str— ADR-0056. Resolves thelocate_in_viewservice for a (possibly empty) selector: emptydetector→default; empty resolved alias → the legacy/openral/perception/locate_in_view(single-detector back-compat); else the namespaced service. Shared by the reasoner dispatch and the deploy launch (which names each locator node's service).class ToolPalette(BaseModel)(L174) — Frozen palette presented to the LLM each tick. Fields:skills: tuple[RSkillToolEntry, ...] = ()(ADR-0022 primary surface),execute_rskill_ids: frozenset[str] = frozenset()(back-compat — auto-derived fromskillsvia the_derive_execute_rskill_idsmodel-validator),sensor_ids: frozenset[str] = frozenset(),node_ids: frozenset[str] = frozenset(),continuous_detectors: tuple[ContinuousDetectorEntry, ...] = ()(ADR-0051 —mode: continuousdetectors for the active robot; coverage, not tools),spatial_memory_available: bool = False(ADR-0039 — gates the two read-onlyrecall_object/resolve_placequery tools; off unless the reasoner_node has a SpatialMemory backend wired),detector_available: bool = False(ADR-0043 — gateslocate_in_view),on_demand_detectors: tuple[OnDemandDetectorEntry, ...] = ()(ADR-0056 — selectable locator options forlocate_in_view),scene_query_available: bool = False(ADR-0047 — gatesquery_scene; independent ofdetector_available). Cross-validator_check_skills_match_idsrejects callers that pass bothskillsandexecute_rskill_idswith disagreeing ids.build_tool_palette(*, installed_skills, robot_capabilities, sensor_ids=(), node_ids=(), commercial_deployment=False, spatial_memory_available=False, detector_available=False, scene_query_available=False) -> ToolPalette— A skill is included iff role=s1, kind≠detector (detector rSkills are perception producers, not ExecuteRskill-dispatchable — ADR-0035/0037), capability flags satisfied, embodiment tags intersect, and (when commercial) license allows commercial use. Amode: continuousdetector (ADR-0051) is instead collected intocontinuous_detectors(coverage for the LLM, never an ExecuteSkill tool); amode: on_demanddetector (ADR-0056) is collected intoon_demand_detectors(selectablelocate_in_viewoptions viadetector_alias, never an ExecuteSkill tool). EmitsRSkillToolEntryrecords (manifestdescription/actions/objects/scenesmirrored in) in stable id-sorted order so the LLM tool schema is deterministic.spatial_memory_availableforwards the read-onlyrecall_object/resolve_placetools (ADR-0039);detector_availableforwards the read-onlylocate_in_viewtool (ADR-0043);scene_query_availableforwards the read-onlyquery_scenetool (ADR-0047). All areToolPalettefields gated intool_useso the LLM only sees a query tool when its dispatcher is wired;detector_availableandscene_query_availableare independent (localization vs scene-state reasoning). The reasoner_node dispatchesquery_scenevia_dispatch_query_scene→/openral/perception/query_sceneand re-prompts with the answer (frame_idscene_vlm).
python/reasoner/src/openral_reasoner/spatial_query.py
ADR-0039 Phase 2 — read-only spatial-memory query bridge: maps a RecallObjectTool / ResolvePlaceTool to an ADR-0038 query, runs it against an injected backend, and renders an LLM-readable result for the prompt cascade. Layer-4 module; does not import openral_world_state (backend is duck-typed).
class SpatialMemoryQuerier(Protocol)— Read-only query surface (recall_object(query, *, now_ns) -> RecallObjectResult;resolve_place(query, *, from_node_id=None) -> ResolvePlaceResult;to_scene_graph() -> SceneGraph— immutable snapshot for telemetry/dashboard); structurally satisfied byopenral_world_state.SpatialMemory.SpatialQueryTool: TypeAlias—RecallObjectTool | ResolvePlaceTool(the read-only ReasonerToolCall variants this bridge dispatches).recall_object_tool_to_query(call) -> RecallObjectQuery/resolve_place_tool_to_query(call) -> ResolvePlaceQuery— tool → ADR-0038 query mappers.format_recall_object_result(query_text, result, *, blocked_node_ids=frozenset()) -> str/format_resolve_place_result(reference, result) -> str— render results as LLM-readable text (misses reported as text, never a fabricated pose).blocked_node_ids(ADR-0044 Phase 4) renders a match whose approach failed grid refinement as "approach BLOCKED on the occupancy grid" instead of a pose.run_spatial_query(call, querier, *, now_ns, from_node_id=None, refine_approach=None) -> str— execute a read-only tool call and render the result; catchesROSObjectNotInMemory→ "not in memory" message.refine_approach(ADR-0044 Phase 4,ApproachRefiner— duck-typed like the querier so this L4 module never imports L2) is applied to everyrecall_objectmatch's approach viewpoint before rendering; aNonefrom the refiner marks the match BLOCKED. Thin wrapper overrun_spatial_query_detailedreturning only.text.class SpatialQueryOutcome(NamedTuple)—(text: str, found: bool).foundisTruewhenrecall_objectreturned ≥1 match (in memory, even if every approach is grid-BLOCKED) orresolve_placeresolved the reference;Falseon a miss. Drives the reasoner's recall→locate_in_viewescalation (ADR-0043/0056).run_spatial_query_detailed(call, querier, *, now_ns, from_node_id=None, refine_approach=None) -> SpatialQueryOutcome— same asrun_spatial_querybut also reports whether the query matched, so the node can escalate a miss to a live perception check without re-parsing the rendered text.ApproachRefiner(TypeAlias =Callable[[ApproachViewpoint, tuple[float, float, float]], ApproachViewpoint | None]) — the occupancy-grid refinement callback contract; the reasoner node wiresrefine_approach_poseover its latched/mapsubscription. (ADR-0044)
python/reasoner/src/openral_reasoner/active_search.py
ADR-0039 §3 Phase 4 — bounded active object search over the scene graph (pure-Python, openral_core only).
class SearchBudget(BaseModel)— frozen;max_candidates(1–50),max_attempts(1–50). The bound.class SearchCandidate(BaseModel)—place_node_id, goal: Pose6D, open_container_id: str | None, reason, rank ∈ [0,1].plan_active_search(graph, *, target_text, budget) -> list[SearchCandidate]— ranked frontier of places to check (occluding containers first, then containers, then places), truncated tobudget.max_candidates;[]when nowhere to search (→ human-handoff). Semantic prioritization among candidates is the LLM's (priors).class SearchProgress— attempt counter against aSearchBudget:record_attempt() -> bool(True while budget remains),attempts,exhausted,reset(). The runaway bound.format_search_frontier(candidates, target_text) -> str— LLM-readable frontier text (empty → "hand off to a human").
python/reasoner/src/openral_reasoner/context.py
ADR-0018 F4 — ContextRenderer builds the structured text snapshot the LLM consumes per tick (no pixels in v1).
- module constant
DEFAULT_BUFFER_SIZE: int = 8— Rolling buffer capacity per category. (L44) - module constant
DEFAULT_PROMPT_PRIORITY: int = 10— Default operator-prompt priority; matchesopenral_prompt_router.DEFAULT_SOURCESauto-cascade priority. Human sources stamp 100 ontometadata_jsonso they drain first (ADR-0018 §3.F10). class FailureEventRecord(frozen dataclass, L48) — Failure-buffer entry; fieldssource, kind, severity, evidence_json, rskill_id, trace_id, stamp_ns.class PerceptionEventRecord(frozen dataclass, L62) — Perception-buffer entry; fieldskind, text, metadata_json, stamp_ns.class PromptRecord(frozen dataclass, L72) — Operator-prompt-buffer entry; fieldstext, metadata_json, stamp_ns, priority=DEFAULT_PROMPT_PRIORITY. Thepriorityfield is filled in byappend_promptfrommetadata_json["priority"]when the record was constructed with the default sentinel.class ContextRenderer(L110) — Stateful renderer. Methods:append_failure,append_perception,append_prompt(priority-ordered insert; buffer-evicts the lowest-priority oldest entry on overflow — every append also bumps the monotonicseqcounter),render(*, world_state) -> str,drain_prompts() -> tuple[PromptRecord, ...](pull-once, priority-desc + arrival-asc order; does NOT bumpseq); propertiesfailures,perception_events,prompts,seq(mutation counter consumed byReasonerCoreto short-circuit a heartbeat tick when no event has arrived since the last successful tick — ADR-0018 amendment 2026-05-25 §2). The## WORLD_STATEblock (_render_world_state) renders joint_state / ee_poses / battery / diagnostics and, since #14 (2026-06-12), ascene_objects[<frame>]: label@(x,y,z), …line fromWorldState.detected_objects(deduped by label, first-seen pose) — so the LLM sees the lifted object labels (e.g.bread) and can map a goal noun (baguette) onto them with its own semantics rather than only learning a name is "not in memory"._summarise_evidence_json(payload) -> str— Decode the FailureEvidence discriminated union and produce a one-line summary. (L373)_extract_priority(metadata_json) -> int— Parse a top-levelpriorityfield out of a PromptStamped's metadata; returnsDEFAULT_PROMPT_PRIORITYon missing / malformed / non-int payload.
python/reasoner/src/openral_reasoner/core.py
ADR-0018 F4 — ReasonerCore, the transport-agnostic orchestrator. The ROS-side reasoner_node wraps this with rclpy.
class ReasonerTickResult(frozen dataclass, L26) — Tick outcome; fieldstool_call: ReasonerToolCall | None,error: ROSPlanningError | None,elapsed_s: float,suppressed_reason: str(one of"","min_interval","heartbeat_idle","retry_cap","palette_empty"),traceparent: str | None(W3C traceparent captured inside the activereasoner.tickspan —Nonewhen no realTracerProvideris installed).class ReasonerCore(L72) — Orchestrator. Methods:tick(*, world_state, renderer, palette, force=False, tier="heartbeat") -> ReasonerTickResult. ADR-0018 §4 min-interval (100 ms) + per-kind retry cap (default 3) enforced here. Heartbeat-idle short-circuit (ADR-0018 amendment 2026-05-25 §2): whenforce=Falseandrenderer.seqmatches the seq at the last successful tick, the LLM call is suppressed withsuppressed_reason="heartbeat_idle". Palette-empty short-circuit prevents wasted LLM calls whenforce=False— aforce=Truetick (event preemption fromSEVERITY_FAILFailureTrigger,SEVERITY_WARNon/openral/failure/safety, or new operator prompt) bypasses the min-interval gate, the heartbeat-idle gate, AND the palette-empty gate so the LLM can pickEmitPromptToolto escalate even on a bare reasoner. The retry-cap gate still applies underforce=True. Thetierkwarg ("A"/"B"/"C"/"D"/"heartbeat") is recorded verbatim on the span asreasoner.tierfor dashboard filtering — observability only; per-tier preemption thresholds live inReasonerNode._FAILURE_TIER_FOR_SOURCE. Wraps the per-tick work inreasoner_span(openral_observability) so the LLM call lives under areasoner.tickOTel span withreasoner.{model, tick.idx, tool, rskill_id, suppressed_reason, error_kind, force, tier}attributes (ADR-0018 §6).
python/wam/src/openral_wam/protocol.py
World Action Model Protocol (CLAUDE.md §6.3).
class WorldModel(Protocol)— Generative simulator used by the planning layer for the three integration patterns (gating / failure anticipation / replanning). Attribute:max_horizon. Method:rollout(world_state, action_chunk, horizon) -> Rollout— predicthorizonsteps of future state; raisesROSConfigError(horizon exceeds max) /ROSInferenceTimeout(budget exceeded). (L31)
python/wam/src/openral_wam/rollout.py
Pydantic v2 schema for a WAM's predicted trajectory.
class Rollout(BaseModel)— Predicted trajectory from oneWorldModel.rolloutcall. Fields:predicted_states: list[WorldState] (min_length=1),predicted_rewards: list[float] | None,horizon: int (>0),latency_ms: float (≥0.0),confidence: float ∈ [0.0, 1.0].extra="forbid". (L24)
python/wam/src/openral_wam/null_wam.py
Identity stub satisfying the WorldModel Protocol (for plumbing tests; not a production fallback).
class NullWorldModel— Returnshorizoncopies of the inputWorldState, no rewards, 0.0 ms latency, confidence 1.0. Attribute:max_horizon. (L27)__init__(max_horizon=16) -> None— RaisesValueErrorifmax_horizon <= 0. (L54)rollout(world_state, action_chunk, horizon) -> Rollout— Replays the input state. RaisesValueErrorforhorizon ∉ (0, max_horizon]. (L60)
packages/openral_safety/openral_safety/supervisor_node.py
Lifecycle node skeleton; reserves the supervisor node name and topic surface for the future C++ kernel (CLAUDE.md §6.1 Layer 6, §7.7). No enforcement logic.
class SafetySupervisorNode(LifecycleNode)— Skeleton lifecycle node. Every transition callback returnsSUCCESS. (L611)__init__(node_name="openral_safety_supervisor") -> None— Initialise; logs a "skeleton no-op" line so the supervisor's presence in the graph is visible. (L99)on_configure(state) -> TransitionCallbackReturn.SUCCESS(L149)on_activate(state) -> TransitionCallbackReturn.SUCCESS(L224)on_deactivate(state) -> TransitionCallbackReturn.SUCCESS(L232)on_cleanup(state) -> TransitionCallbackReturn.SUCCESS(L239)on_shutdown(state) -> TransitionCallbackReturn.SUCCESS(L262)main(args=None) -> int— Entry point forros2 run openral_safety supervisor_node. (L614)
packages/openral_safety/openral_safety/envelope_loader.py
Pydantic → C++ kernel ROS-param bridge (ADR-0020 PR-K; ADR-0030 collision).
compute_intersection(robot, skill=None) -> EnvelopeIntersection— Robot ceiling ∩ optional skill envelope; rejects (never clamps) a skill that loosens the ceiling.kernel_params_from_envelope(envelope) -> dict[str, object]— Canonical scalar/AABB envelope → kernel ROS-param dict.collision_params_from_description(robot, *, margin_m=0.0) -> dict[str, object]— ADR-0030. Flattencollision_geometry+allowed_collision_pairs+ the kinematic chain (jointorigin_xyz/rpy/axis) into the kernel's per-capsule collision params (collision_capsule_link+ parallel radius/half-length/origin arrays, link-level ACM), topologically ordered.{"self_collision_enabled": False}when no geometry. Manifest-source adapter.ee_link_index_from_collision_params(params) -> int— ADR-0040 Phase 3. Pick the predictive-Cartesian EE control link (the kinematically deepest collision link) for the kernel's Jacobian look-ahead;-1when no collision model (predictive disabled, reactive floor only). Mis-identification is bounded by the reactive check.
packages/openral_safety/openral_safety/mjcf_lowering.py
Offline MJCF → kernel collision-params lowering (ADR-0030); imports mujoco lazily.
lower_collision_params(model, joint_names, *, margin_m=0.0) -> dict[str, object]— Lower a compiledmujoco.MjModelto the kernel's collision params from the full kinematic tree (fixed mounts + floating base): per-link origins from the body tree, every collidable primitive per body as a capsule (cylinder→capsule, box→bounding-sphere; mesh/plane skipped),dof_indexmatched by joint name, ACM = parent↔child + MJCF excludes + a neutral-pose overlap sweep (the MoveIt "disable always-in-collision pairs" rule under the kernel's own capsule approximation).
packages/openral_safety/openral_safety/urdf_lowering.py
Offline URDF(+SRDF) → manifest collision-model lowering tool (ADR-0030); lazy-imports yourdfpy / trimesh (the [lowering] group). Populates robot.yaml's collision_geometry + allowed_collision_pairs (the hand-reviewable manifest path), distinct from mjcf_lowering (the runtime MJCF path).
parse_srdf_disabled_pairs(srdf_path) -> set[frozenset[str]]— Parse a MoveIt SRDF's<disable_collisions>rows into unordered link pairs (the ACM).fit_capsule_to_vertices(vertices) -> tuple[CapsuleShape, tuple[float×6]]— PCA bounding capsule (segment along +Z) containing every vertex — a conservative over-approximation so the safety check never under-covers; returns the shape + link-frameorigin_xyz_rpy(kernel's rpy convention, inverse ofmjcf_lowering._rpy_to_mat).lower_link_geometry(urdf_path) -> list[LinkCollisionGeometry]— One conservative capsule/sphere per URDF link with a<collision>(box→8 corners, cylinder→cap rims, sphere→exactSphereShape, mesh→trimeshvertices PCA-fit), vertices first transformed into the link frame by the collision<origin>.acm_for_geometry(urdf_path, geoms, *, srdf_path=None, n_samples=2000, seed=20260610, margin_m=0.0) -> set[frozenset[str]]— The ACM for a specific per-link capsule geometry (the geometry the kernel will actually load).ACM = adjacent ∪ always-colliding(capsule) ∪ [SRDF-disabled if srdf_path else never-colliding(capsule)]. The always-colliding term adds the capsule-junction pairs a mesh-based SRDF omits (e.g. a short link making skip-one neighbours' capsules overlap) — without them the capsule kernel false-E-stops every step. Deterministic under the pinned seed.sample_acm_from_urdf(urdf_path, *, n_samples=2000, seed=20260610, margin_m=0.0) -> set[frozenset[str]]— No-SRDF fallback: lowers the URDF's own collision geometry and runsacm_for_geometrywithout an SRDF. Verified conservative against URDF-lowered (mesh-bounding) capsules — its disabled set is a subset of the precise-mesh SRDF's, never false-permissive.lower_robot(robot, *, srdf_path=None, acm_only=False, geometry_only=False) -> LoweredCollisionModel— Top-level entry. ACM source precedence: explicitsrdf_path→robot.srdf_path→ URDF sampling fallback; ACM scoped to links carrying geometry. Generated geometry is scoped to the manifest's kinematic chain (no orphan URDF links);joint_fkis lowered too (unlessacm_only).acm_only/geometry_onlyrestrict output so hand-tuned safety geometry isn't churned. RaisesValueErrorifrobot.urdf_pathis unset/unresolvable (arobot_descriptions:<module>xacro form is accepted).lower_joint_fk(robot, urdf_ref) -> dict[str, tuple[xyz, rpy, axis]]— Per-manifest-joint forward kinematics (origin + axis) read from the URDF, matched to manifest joints bychild_link. The kernel needs these to place the link capsules (ADR-0030). Unmatched joints (synthetic gripper / base DoF) are omitted.lower_robot_from_mjcf(robot, *, n_samples=2000, seed=20260610, margin_m=0.0, manifest_dir=None) -> LoweredCollisionModel— MJCF backend for robots with no URDF whose collision is meshes (mjcf_lowering's primitive path skips them), e.g. bimanualopenarm. Keeps the manifest's hand-authored capsules; lowers joint FK (the MJCF parent→child transform at rest) + the conservative ACM (mujoco-FK sweep). Manifest↔MJCF link-name divergence is reconciled viasim_joint_name. Lazy-importsmujoco+openral_core.assets.resolve_asset(resolvesrobot.assets.mjcf, honouringmanifest_dirforfile:refs; ADR-0058).acm_source="mjcf".select_lowering(robot, *, manifest_dir=None) -> LoweringSource— Provenance-correct routing (ADR-0058 §5):"srdf"when SRDF+URDF both present (mesh-proven ACM),"sampling"when a URDF with usable collision meshes but no SRDF,"mjcf"when no usable URDF geometry but an MJCF exists (e.g.openarm, whose vendored URDF'spackage://collision meshes don't resolve → 0 geometry). Replaces the naiveurdf if assets.urdf else mjcfguess that wrongly sentopenarmto the empty URDF path. RaisesROSConfigErrorwhen no lowerable asset.lower_robot_auto(robot, *, acm_only=False, geometry_only=False, manifest_dir=None) -> LoweredCollisionModel— Single dispatch overselect_lowering→lower_robot(srdf/sampling) orlower_robot_from_mjcf(mjcf). The one entry the CLI (openral collision lower|check) and the byte-identical regression both call, so routing can never diverge between what's committed and what's verified.LoweringSource—Literal["srdf", "sampling", "mjcf"]; the sourceselect_loweringresolves to (matchesLoweredCollisionModel.acm_source).class LoweredCollisionModel— Frozen dataclass result:collision_geometry,allowed_collision_pairs(sorted tuples),acm_source("srdf"|"sampling"|"mjcf"),srdf_path,joint_fk(per-joint FK for onboarding).
packages/openral_reasoner_ros/openral_reasoner_ros/reasoner_node.py
ADR-0018 F4 — reasoner_node lifecycle wrapper. Thin rclpy shell around openral_reasoner.ReasonerCore.
- module constants
_FAILURE_SOURCES,_PERCEPTION_KINDS— closed sets from ADR-0018 §3 (hal/sensor/rskill/safety/wam/critic(therskillsuffix replacedskillon 2026-05-25 — ADR-0018 amendment §5) andmotion/objects/ocr/scene_change). (L193) - module constants
_KIND_TIMEOUT,_KIND_CONTROLLER,_SEVERITY_WARN,_SEVERITY_FAIL— IDL-mirror constants foropenral_msgs/FailureTrigger. Kept inline rather than importing theopenral_observability.failure_bushelper so the reasoner emits aFailureTriggerwithout dragging the rate-limiter into the dispatch path (the reasoner publishes O(1) events per skill goal, not a stream). (L201) - module constants
_EXECUTE_SKILL_SERVER_PROBE_S,_LIFECYCLE_SERVER_PROBE_S— 100 mswait_for_server/wait_for_serviceprobes so an absent F1 server / lifecycle peer can't block the executor thread. (L209) - module constant
_FAILURE_TIER_FOR_SOURCE: dict[str, str]— ADR-0018 2026-05-25 amendment trigger taxonomy. Greppable map of each/openral/failure/<source>to its tier:safety → "A",hal/sensor/rskill/wam → "B",critic → "C". Used by_on_failureto stampreasoner.tieron the OTel span — observability only; the per-source preemption threshold (SEVERITY_WARNfor safety,SEVERITY_FAILfor everything else) is decided inline in the same callback. - (the former module-local
_SIM_EXECUTABLE_CONTROL_MODESfrozenset was removed 2026-06-04; thehal_mode == "sim"gate now imports the canonicalopenral_core.SIM_EXECUTABLE_CONTROL_MODES, trimmed to the six packer-implemented modes — see the Layer-0 core entry. ADR-0036 amendment 2026-06-04.) def _required_control_modes(manifest: RSkillManifest) -> set[ControlMode](L262) — ADR-0036 pure helper for the deploy-path palette gate. Readsaction_contractby specificity:None → set()(no action constraint);representationset →control_modes_for_representation(...);slotsset → each non-Noneslot'scontrol_mode; baredim(legacy) →{JOINT_POSITION}.def _action_executable(manifest: RSkillManifest, description: RobotDescription, hal_mode: str) -> bool(L293) — ADR-0036 pure helper.Truewhen every_required_control_modes(manifest)is in the executable set:openral_core.SIM_EXECUTABLE_CONTROL_MODESforhal_mode == "sim", elsedescription.capabilities.supported_control_modes(coerced toControlModeboth sides so an enum-member or raw-"joint_position"-string deserialisation compares equal). Empty required set →True.class ReasonerNode(LifecycleNode)(L333) — Lifecycle node.__init__(*, node_name="openral_reasoner", tick_hz=0.2, client=None, palette=None, robot_capabilities=None, commercial_deployment=False, spatial_memory=None). ADR-0039 Phase 2b:spatial_memoryis an optional read-onlySpatialMemoryQuerierbackend (an ADR-0038SpatialMemory); when supplied the palette'sspatial_memory_availableis set (therecall_object/resolve_placetools are offered) and the rebuild path threads it through. Deployment wiring: thespatial_memory_pathROS parameter (default"") loads a persisted scene graph as that backend aton_configurewhen no backend was injected (see_maybe_load_spatial_memory); thespatial_memory_ingestROS parameter (defaultfalse) auto-creates an empty backend and folds eachWorldState.detected_objectssnapshot into it on tick (ADR-0038 live dynamic memory from the ADR-0035 producer). ADR-0036: thehal_modeROS parameter (default"sim") selects the action-mode palette gate (_action_executable) the skill-registry refresh applies.tick_hzis the heartbeat rate (default 0.2 Hz = one tick every 5 s; was 5.0 pre-2026-05-25 amendment to ADR-0018 — the reasoner is now event-driven with a slow heartbeat). The two refresh-kwargs (added in the F4 contract-closure follow-up) drive the/openral/skill_registry_changedrefresh path: withoutrobot_capabilitiesthe callback logs a warning and leaves the palette alone (an empty-capabilities refresh would risk dispatching incompatible skills).on_configure— BuildToolUseClientfrom env if not injected, attach subscribers to/openral/world_state_slow+ 6 failure topics + 4 perception topics +/openral/prompt+/openral/skill_registry_changed, create the/openral/promptpublisher +/openral/failure/rskillpublisher +/openral/execute_rskillaction client. Reads thevram_lifecycle_peersROS parameter (default[]) into_vram_lifecycle_peers— ADR-0050 GPU peers auto-deactivated before eachexecute_rskilland reactivated after (the deploy launch sets it toopenral_ros_image_detectorwhen--enable-object-detector).on_activate— Arm the periodic tick timer attick_hz.on_deactivate— Cancel the tick timer (subscriptions remain attached).on_cleanup— Tear down pending skill-goal deadline timers, destroy the action client, drop cached lifecycle clients._on_failure(source, msg)— Append aFailureEventRecordto the renderer; preempt the next tick per the ADR-0018 amendment 2026-05-25 trigger taxonomy — Tier A (source == "safety") preempts onseverity ≥ SEVERITY_WARN, Tier B/C (hal,sensor,rskill,wam,critic) preempts onseverity ≥ SEVERITY_FAIL._on_tick(*, force=False, tier="heartbeat")— InvokeReasonerCore.tick(..., tier=tier); route the resultingReasonerToolCallvia_dispatch(call, traceparent=result.traceparent). Suppressed ticks log at DEBUG (min_interval,heartbeat_idle) or WARN (retry_cap) per their operational signal-to-noise. Thetierarg is passed through from the preempting callback (Afrom_on_failure(source="safety"),B/Cfrom other failure sources,Dfrom_on_prompt) and lands on thereasoner.tickOTel span asreasoner.tier._on_skill_registry_changed(msg)— ADR-0018 §4 palette refresh. WalksrSkill.list_installed(), loads each entry'smanifest_pathinto a realRSkillManifest, runsbuild_tool_palette(...)against the activerobot_capabilities+commercial_deploymentflag, installs the result viaset_palette.openral_rskillis lazy-imported to keep the node cheap to import._dispatch(call, *, traceparent=None)— Routing-only switch over theReasonerToolCallvariants; delegates to_dispatch_emit_prompt/_dispatch_execute_skill/_dispatch_lifecycle_transition/_dispatch_spatial_query.ReloadGstPipelineToolis the sole log-and-acknowledge stub (F6 sensor-package service IDL not yet on disk — GH-126)._dispatch_emit_prompt(call, *, traceparent)— Publish aPromptStampedon/openral/prompt; stamps the threaded-throughtraceparentintometadata_jsonper ADR-0018 §6._dispatch_spatial_query(call, *, traceparent)— ADR-0039 Phase 2b/§3. Read-only: runs aRecallObjectTool/ResolvePlaceToolagainst the injectedSpatialMemoryviarun_spatial_query_detailedand republishes the rendered result as aPromptStampedwith frame_id"spatial_memory"(so_on_promptconsumes it, not filtered as a self-emit) — the prompt cascade feeds the answer into the next tick. Bounded by aSearchProgress/SearchBudget: consecutive queries are counted, and oncemax_attemptsis hit the result is published with the reasoner's own frame_id (filtered by_on_prompt→ no further tick), terminating the cascade in human-handoff. Reset on any non-query dispatch and on a non-cascade operator prompt. No actuation, noFailureTrigger. Warns + no-ops if no backend is wired. ADR-0043/0056 (#10): arecall_objectmiss (SpatialQueryOutcome.found == False) escalates to a livelocate_in_viewfor the same query term — policy-driven (not LLM-chosen) — whendetector_availableand the term hasn't already been escalated this search streak (tracked in_locate_escalated, reset with the search bound); the open-vocab detector grounds objects the map never ingested / labelled differently before the budget reaches human-handoff. ADR-0044 Phase 4: when a latched/maphas been received (paramsoccupancy_map_topicdefault/map— empty disables;approach_inflation_mdefault 0.25), everyrecall_objectapproach viewpoint is refined throughrefine_approach_posebefore rendering, so the LLM only sees grid-valid approach poses (BLOCKED note when none exists; grid absent → geometric pass-through)._maybe_load_spatial_memory()— ADR-0039 deployment wiring. Onon_configure, when no backend was injected andspatial_memory_pathis set, lazy-importsopenral_world_state.SpatialMemory,SpatialMemory.load(path), sets it as the query backend, and flipsspatial_memory_available. Load failure (OSError/ValueError) degrades to WARNING + no backend (tools simply not offered) — never a fabricated map. Wired insim_e2e.launch.pyvia thespatial_memory_path:=<path>launch arg. Emits the loaded map once via_emit_scene_objects_span._emit_scene_objects_span()— ADR-0038 dashboard telemetry. When a spatial-memory backend is wired, callsopenral_world_state.emit_scene_objects_span(self._spatial_memory.to_scene_graph(), source_node=…)to publish theworld.scene_objectsspan (scene-objects card + SLAM-map overlay). Called once on load and on every heartbeat_on_tick(above the_core is Noneguard, so a preloaded map shows even before the tool-use client builds). Advisory only; all failures swallowed at DEBUG so telemetry never disturbs the tick._dispatch_execute_rskill(call, *, traceparent)— Probe the/openral/execute_rskillaction server (100 mswait_for_server); on absence emit aKIND_CONTROLLERFailureTriggerand bail. ADR-0050 amendment 2026-06-12: whenvram_lifecycle_peersis non-empty it routes through_free_vram_peers_then_send(deactivate the GPU peers first, then send); otherwise calls_send_execute_rskill_goaldirectly._send_execute_rskill_goal(call, traceparent)— BuildExecuteRskill.Goal, send asynchronously withfeedback_callback=_on_execute_rskill_feedback, attach_on_execute_rskill_goal_responseto the send future. (Extracted from_dispatch_execute_rskillfor the VRAM-eviction sequencing.)_free_vram_peers_then_send(call, peers, traceparent)— ADR-0050. Deactivate each GPU lifecycle peer via_change_state_async, and send the goal only once all in-flightchange_stateresponses return — so the peer's VRAM (e.g. the ~1.3 GB object detector) is released before the runner loads the policy on an 8 GB card. Peers whose service is absent are skipped (dispatch still proceeds); the deactivated subset is recorded in_deactivated_vram_peersfor reactivation._reactivate_vram_peers()— Reactivate the peers in_deactivated_vram_peers(clears the set first → idempotent). Called from_on_execute_rskill_result(terminal) and the goal-reject/error branches of_on_execute_rskill_goal_response; not ondeadline(the policy may still be resident)._on_reactivate_result(peer, future)— Best-effort log of a reactivationchange_stateoutcome._change_state_async(node, transition) -> future | None— Shared helper: lazily create + cache alifecycle_msgs/srv/ChangeStateclient per peer node, map"configure"/"activate"/"deactivate"/"cleanup"toTransition.TRANSITION_*, and call asynchronously. ReturnsNoneif the service isn't on the graph. Used by both_dispatch_lifecycle_transitionand the VRAM-eviction path._dispatch_lifecycle_transition(call)— Drive<call.node>/change_statevia_change_state_async; on success attach_on_lifecycle_response, on an absent service log + skip._on_execute_skill_feedback(rskill_id, feedback_msg)— Forward action feedback to the operator log at warning level (rare event; OTel/structlog routes this to the dashboard)._on_execute_skill_goal_response(call, sent_at, future, traceparent)— On rejection emit aKIND_CONTROLLERFailureTrigger; on acceptance arm a one-shot deadline timer (_on_execute_skill_deadline, only whencall.deadline_s > 0) and attach_on_execute_skill_resulttoget_result_async()._on_execute_skill_result(call, goal_id, future, traceparent)— Cancel the deadline timer; onSTATUS_SUCCEEDED + result.successlog success; on abort/cancel/non-success emit aKIND_CONTROLLERFailureTriggerwith aControllerEvidencepayload (state ∈ {aborted,canceled,failed},detail=result.failure_reason)._on_execute_skill_deadline(*, call, sent_at, goal_handle, traceparent)— Cancel the goal viacancel_goal_async(); emit aKIND_TIMEOUTFailureTriggerwithTimeoutEvidence(operation="skill.<rskill_id>", deadline_s, elapsed_s)._on_lifecycle_response(call, future)— Log theChangeStateresult; lifecycle failures are operator-driven and surface in the target node's own logs (noFailureTriggerre-emission)._publish_skill_failure(*, kind, rskill_id, evidence, traceparent, trace_id=None)— Build + publish aFailureTriggeron/openral/failure/rskillwithseverity=SEVERITY_FAIL;trace_id(when propagated by the action result) takes precedence over the reasoner's activetraceparent.- Properties
renderer,dispatched_calls; methodset_palette(palette)(imperative seam called from the/openral/skill_registry_changedrefresh callback). _QOS_REGISTRY_CHANGED— RELIABLE + TRANSIENT_LOCAL + KEEP_LAST=1 so a late-subscribing reasoner sees the most recent invalidation.main(args=None) -> int— Entry point forros2 run openral_reasoner_ros reasoner_node.
packages/openral_prompt_router/openral_prompt_router/prompt_router_node.py
ADR-0018 F10 — single lifecycle node that fans in operator prompts from any external source into /openral/prompt. CLI is the only v1 adapter; WebSocket / voice / Slack out-of-scope per ADR §"out-of-scope".
- module constant
DEFAULT_SOURCES: dict[str, int] = {"cli": 100, "dashboard": 100, "auto": 10}— Default source → priority registry; human sources get 100, machine cascades get 10. (L59) class PromptRouterNode(LifecycleNode)(L66) — Lifecycle node.__init__(*, node_name="openral_prompt_router", sources=None)— Initialise with a source → priority registry. Defaults toDEFAULT_SOURCES.on_configure— Build the/openral/promptfan-out publisher and one/openral/prompt_in/<source>subscriber per allowed source._on_inbound(source, priority, msg)— Forward the inbound PromptStamped onto/openral/promptafter merging{"source": ..., "priority": ...}intometadata_json(preserving any per-source fields).- Property
forwarded_count— Number of prompts forwarded sinceon_configure(for tests). main(args=None) -> int— Entry point forros2 run openral_prompt_router prompt_router_node.
python/cli/src/openral_cli/prompt.py
ADR-0018 F10 — openral prompt "do X" CLI adapter. Publishes a one-shot PromptStamped onto /openral/prompt_in/cli for the prompt-router to fan out. rclpy lazy-imported so openral --help stays sub-second.
prompt_command(text, topic="/openral/prompt_in/cli", wait_s=1.0)— Initialise rclpy, publish one PromptStamped withmetadata_json={"source_cli": true}, wait briefly for the subscriber to be discovered, then shut down. Exits 2 if rclpy / openral_msgs are not importable (with a hint atjust ros2-build).
Observability (Layer 8 — fully shipped, ADR-0017)
python/observability/src/openral_observability/_sdk.py
Idempotent OTel SDK setup + flush helper.
configure_observability(*, service_name="openral", endpoint=None, sample_ratio=None) -> bool— Install OTLP/gRPC tracer + meter + logger providers; readsOTEL_EXPORTER_OTLP_ENDPOINTwhenendpointis None; returnsTrueif exporters were installed,Falsefor the no-op path. On a successful install also kicks offstart_system_metrics_collectorso the dashboard's System health card receives CPU / RAM / GPU gauges. Registersshutdown_observabilityviaatexiton first install. Metric reader interval is configurable viaOPENRAL_OTEL_METRIC_INTERVAL_MS(default 5 s); theBatchSpanProcessorflush interval viaOPENRAL_OTEL_SPAN_SCHEDULE_DELAY_MS(default 30 ms ≈ 33 Hz — set ~1.3× the 25 Hz thumbnail rate so the dashboard captures every frame without flush-aliasing; raise it for coarser production batching).sample_ratioselects the trace sampler —None/1.0→ALWAYS_ON, values in(0, 1)→ParentBased(TraceIdRatioBased(ratio)); honorsOPENRAL_OTEL_SAMPLE_RATIOenv var when arg is None. (L107)_resolve_sampler(sample_ratio) -> Sampler— Resolve the trace sampler from arg + env, defaulting toALWAYS_ON. Garbage env values fall back to always-on so a typo never drops every span. (L223)shutdown_observability() -> None— Flush + shut down all three providers; idempotent and safe to call when no exporter was installed. Stops the system-metrics collector before draining the meter so the final sample lands in the export batch. (L280)
python/observability/src/openral_observability/tracing.py
Span-context-manager helpers; safe to call before configure_observability.
rskill_span(name, *, rskill_id=None, role=None, **attrs)— Span for a Skill lifecycle phase; emitsrskill.id/rskill.rolefromsemconv. (L40)inference_span(name="skill.chunk_inference", *, chunk_index=None, kind="foreground", **attrs)— Span for one VLA chunk inference; emitsinference.kind/inference.chunk_index. (L70)safety_span(name="safety.check", *, check_name=None, severity="info", **attrs)— Span for a safety check; the C++ kernel parents its ownsafety.checkto the Python tick via the propagator. (L98)reasoner_span(name="reasoner.tick", *, tick_idx=None, model=None, force=None, **attrs)— Span for oneReasonerCore.tick(ADR-0018 F4). Setsreasoner.{tick.idx, model, force}and accepts any extrareasoner.*attribute via**attrs. Used byopenral_reasoner.coreto recordreasoner.{tool, rskill_id, suppressed_reason, error_kind}over the LLM call. (L132)traced(name=None)— Decorator that wraps a sync function in a span named after it. (L190)
python/observability/src/openral_observability/cli.py
Root-span helper for the openral CLI.
cli_command_span(subcommand, *, mode=None, run_id=None, **attrs)— Open thecli.commandroot span for one CLI invocation; recordscli.subcommand,openral.run.id, optionalopenral.run.mode/openral.run.git_sha. (L52)
python/observability/src/openral_observability/diagnostics.py
ADR-0018 F8 — diagnostic_msgs/DiagnosticArray heartbeat helper, shared by every OpenRAL lifecycle node.
Level— Mirror ofdiagnostic_msgs/DiagnosticStatuslevel constants (OK=0,WARN=1,ERROR=2,STALE=3); re-exported sostatus_fncallbacks can avoid importingdiagnostic_msgson pure-Python hosts. (L32)DiagnosticsHeartbeat(node, *, hardware_id, component_name, status_fn, rate_hz=1.0)— 1 Hz/diagnosticspublisher attached to arclpy.lifecycle.LifecycleNode. Drives the standardcreate_publisher(inon_configure) /start(inon_activate) /stop(inon_deactivate) /destroy(inon_cleanup) sequence;publish_once()exposes a deterministic publication for tests; an exception insidestatus_fnis converted to a synthetic ERROR-level diagnostic so the timer never crashes the node. (L49)
python/observability/src/openral_observability/lifecycle.py
Make LifecycleNode transition-callback failures observable — rclpy's __execute_callback swallows callback exceptions into TransitionCallbackReturn.ERROR without logging (literal # TODO(ivanpauno): log sth here), so a composing host reports only exit code 4.
log_lifecycle_errors(callback) -> callback— Decorator foron_configure/on_activate/ … transition callbacks. Transparent on success; on an uncaught exception it logs the callback name + full traceback vianode.get_logger().error(...)(→/rosout→ launch console) and returnsTransitionCallbackReturn.FAILUREinstead of letting the exception escape into rclpy's silentERRORconversion. Applied to theon_configure/on_activateofRskillRunnerNode,_WorldStateLifecycleNode,HALLifecycleNodeBase(covers every per-robot HAL), andReasonerNode. Importsrclpylazily so the module stays import-safe on pure-Python hosts. (L39)
python/observability/src/openral_observability/semconv.py
Single source of truth for OpenRAL OTel attribute / span / metric names.
Final[str] constants for: the legacy rskill.* / skill.* / inference.* / safety.* attribute prefixes (shipped today); the greenfield openral.run.* / openral.tick.* / openral.skill.* / openral.hal.* / openral.sensors.* / openral.world_state.* / openral.dataset.* namespaces; span names (SPAN_*, incl. SPAN_WORLD_SCENE_OBJECTS = "world.scene_objects"); the ADR-0038 openral.world_state.scene_objects.* dashboard attrs (WORLD_SCENE_OBJECTS_LIST / _COUNT / _FRAME / _SOURCE_NODE); span-event names (EVENT_*, incl. EVENT_EPISODE_CLOSED added in ADR-0019); metric instrument names (METRIC_*); closed-set metric label keys (LABEL_*); and enum values for openral.run.mode / openral.safety.kernel. ADR-0019 also adds DATASET_EPISODE_SUCCESS to the openral.dataset.* namespace; the placeholder DATASET_REPO_ID / DATASET_EPISODE_IDX / DATASET_FRAME_IDX constants (L143–145) are now written by openral_dataset.RolloutRecorder.
python/observability/src/openral_observability/metrics.py
Cached OTel meter instruments — safe to call before configure_observability.
get_meter() -> Meter— Resolve the OpenRAL meter against the currentMeterProvider. (L66)get_tick_duration() -> Histogram—openral.tick.duration, unitms. (L101)get_inference_duration() -> Histogram—openral.inference.duration, unitms. (L116)get_hal_read_state_duration() -> Histogram—openral.hal.read_state.duration, unitms. (L128)get_hal_send_action_duration() -> Histogram—openral.hal.send_action.duration, unitms. (L140)get_sensors_age_ms() -> Histogram—openral.sensors.age_ms, unitms. (L152)get_world_state_staleness_ms() -> Histogram—openral.world_state.staleness_ms, unitms. (L164)get_tick_budget_violations() -> Counter—openral.tick.budget_violations. (L179)get_tick_deadline_misses() -> Counter—openral.tick.deadline_misses. (L190)get_inference_timeouts() -> Counter—openral.inference.timeouts. (L201)get_safety_violations() -> Counter—openral.safety.violations, labelscheck_name/severity. (L212)get_safety_clamps() -> Counter—openral.safety.clamps, labelcheck_name. (L226)get_hal_estop_count() -> Counter—openral.hal.estop.count. (L237)get_sensors_stale_reads() -> Counter—openral.sensors.stale_reads. (L248)get_observability_export_failures() -> Counter—openral.observability.export_failures, labelsignal_kind. (L281)get_world_state_components_stale() -> UpDownCounter—openral.world_state.components_stale. (L298)record_histogram_ms(instrument, value_ms, attributes=None) -> None— Record a millisecond value, skipping negatives andNaN. (L387)
python/observability/src/openral_observability/producer.py
Producer-side helpers for recording rich span attributes on OpenRAL hot-path spans. Safe to call on no-op spans; lists are truncated to _MAX_JOINTS / _MAX_EE_FRAMES and floats rounded to 3 decimals.
record_joint_state(span, *, names, positions, velocities=None, efforts=None, position_limits=None, velocity_limits=None, effort_limits=None, stamp_ns=None) -> None— Attach per-joint attributes to ahal.read_statespan. (L83)record_action(span, *, next_row, dim=None, horizon=None, applied=None, gripper_position=None, gripper_force_n=None) -> None— Attach commanded-action attributes to ahal.send_actionspan. (L131)record_ee_poses(span, ee_poses) -> None— Flatten aname → Pose6Dmapping onto aworld_state.snapshotspan. (L162)record_sensor_frame_attrs(span, *, modality=None, encoding=None, width=None, height=None, channels=None, age_ms=None, thumbnail_bytes=None, thumbnail_already_encoded_b64=False) -> None— Attach sensor-frame attributes to asensors.read_latestspan. (L188)encode_rgb_thumbnail(rgb) -> bytes | None— Encode an HWC uint8 RGB ndarray to a small JPEG for OTLP; returnsNoneif Pillow is unavailable. (L229)encode_frame_thumbnail(frame) -> bytes | None— Encode anopenral_core.SensorFrame(RGB8/BGR8/MONO8/JPEG/PNG) as a small JPEG thumbnail; returnsNonefor non-renderable encodings. (L254)modality_for_encoding(encoding) -> str— Map aFrameEncoding(or its string value) to the dashboard's modality label (rgb/mono/depth/raw/unknown). Reused byHardwareRunner._tick_implandworld_state_ros/lifecycle_node._on_imageso both surfaces produce identical modality labels for the same encoding. (L51)_MODALITY_BY_ENCODING: dict[str, str](L39) — Canonical encoding → modality lookup table.
python/observability/src/openral_observability/system_metrics.py
Background sampler for the openral.system.* gauges; feeds the dashboard's System Health card via psutil (CPU + RAM) and optional pynvml (GPU memory + util).
start_system_metrics_collector(*, interval_s=1.0) -> bool— Start a daemon thread that samples host metrics everyinterval_sseconds. ReturnsFalseand a quiet no-op when neitherpsutilnorpynvmlis importable. Idempotent; re-starts retune the interval. (L45)stop_system_metrics_collector(*, timeout_s=2.0) -> None— Signal the collector thread to stop and join. Safe to call when not running. (L75)
python/observability/src/openral_observability/propagation.py
W3C TraceContext inject / extract for cross-process trace correlation.
current_traceparent() -> str | None— W3Ctraceparentvalue for the active span, orNoneoutside a span. (L37)inject_traceparent(carrier=None) -> dict[str, str]— Write the active span'straceparent(and optionaltracestate) into a carrier dict; used by producers ofActionChunk.msg/ExecuteRskill.action/FailureTrigger.msg. (L52)extract_traceparent(traceparent, tracestate=None) -> Context— Parse a wire-sidetraceparentinto an OTelContextforcontext.attach/trace.use_span; consumed by the C++ safety kernel and any Python ROS consumer. (L82)
python/observability/src/openral_observability/failure_bus.py
ADR-0018 F3 — publisher helper + IDL-mirror constants for the namespaced /openral/failure/{...} bus.
class FailureSource(str, Enum)(L118) —HAL | SENSOR | SKILL | SAFETY | WAM | CRITIC; the string value is the topic suffix.topic_for(source: FailureSource) -> str(L133) — Pure helper:FailureSource → /openral/failure/<suffix>.KIND_*/SEVERITY_*intmodule constants (L94–L109) — Mirroropenral_msgs/msg/FailureTrigger; bump both when the IDL changes.DEFAULT_RATE_LIMIT_HZ: dict[int, float | None](L150) — Per-severity defaults (INFO/WARN → 10/s, FAIL/ABORT → unlimited).DEFAULT_SUMMARY_PERIOD_S = 1.0(L154).class _TokenBucket(L164) — Private, lock-protected.__init__(rate_hz, *, capacity=1.0, clock=time.monotonic);try_consume() -> bool.class FailureBusPublisher(L213) —__init__(node, source, *, rate_limit_hz=None, summary_period_s=1.0, clock=None). Methods:create_publisher()(opens RELIABLE+VOLATILE+KL=50 publisher ontopic_for(source)),start()(boots 1 Hz suppressed-summary timer),stop(),destroy(),publish(*, kind, severity, evidence, rskill_id='', trace_id=None) -> bool(False when rate-limited). Properties:topic,source.
python/observability/src/openral_observability/logging.py
trace_context_processor(_logger, _method_name, event_dict)— structlog processor that stampstrace_id/span_idon every log event. (L31)install_structlog_bridge(logger_provider)— Wire the structlog processor chain to forward records to the OTelLoggerProvider. (L47)
python/observability/src/openral_observability/dashboard/store.py
In-memory aggregator for openral dashboard — feeds the SSE stream and the /api/state JSON endpoint. Thread-safe, bounded (200 events, 600 metric samples per series). (ADR-0017, issue #44). Span families registered in _HEADLINE_FAMILIES (L772+): rskill.execute, rskill.tick, rskill.activate, rskill.configure, skill.chunk_inference, safety.check, hal.send_action, hal.read_state, sensors.read_latest, world_state.snapshot, slam.occupancy_grid (ADR-0025 SLAM map card), reasoner.tick (ADR-0018 F4 — last LLM tool decision, rendered in the Reasoner card added alongside ADR-0025's navigate-look-pick demo), sim.run, sim.step, cli.command. Each populates one slot in self._topics: dict[str, dict[str, Any]] (L266+).
class TelemetryEvent— Frozen dataclass holding one event log row (ts_unix,kind,title,attrs,severity)..to_json()returns a plain dict. (L137)class TelemetryStore— Read-side aggregator over OTLP signals. (L283)ingest_spans(payload: list[ResourceSpans]) -> int— Decode + record spans; populates headline cards, increments span-event counters, publishes a delta to every subscriber queue. Returns the number of spans recorded. Routes by span name into per-topic buckets, incl.world.scene_objects→topics["scene_objects"](ADR-0038 — durable spatial-memory objects for the scene-objects card + SLAM-map overlay; theworld_state.scene_objects.listJSON attr is decoded via_parse_object_list). (L357)ingest_metrics(payload: list[ResourceMetrics]) -> int— Decode + record metric data points; appends per-series samples and tracks cumulative sums. (L391)ingest_logs(payload: list[ResourceLogs]) -> int— Decode + record OTLPResourceLogs(the structlog→OTel bridge) as event-log rows (issue #318): body → title, instrumentation scope (logger) name → kind,severity_number→debug/info/warn/error/fatalvia_log_level. Records share the bounded event ring with spans/span-events; the UI defaults the Debug chip off so high-rate DEBUG stays opt-in. Returns the number of log records recorded.snapshot() -> dict[str, Any]— One-shot view: service identity, headline cards, event ring, counters, metric series with p50/p95. (L451)subscribe() -> asyncio.Queue— Register an SSE subscriber. The queue is bounded; on overflow the oldest payload is dropped so the producer never blocks. (L456)unsubscribe(queue) -> None— Drop a subscriber's queue. (L472)
python/observability/src/openral_observability/dashboard/app.py
create_app(store: TelemetryStore | None = None) -> FastAPI— Build the dashboard ASGI app. Routes:/,/static/*,/healthz,/api/state,/api/stream(SSE), the OTLP/HTTP receiversPOST /v1/traces,POST /v1/metrics,POST /v1/logs(logs now feed the event log viaTelemetryStore.ingest_logs— issue #318), and the operator write endpointPOST /api/prompt(ADR-0018 F10, shells out toopenral prompt --topic /openral/prompt_in/dashboard). Honours gzip-encoded request bodies. (L151)
python/observability/src/openral_observability/dashboard/server.py
run_dashboard(*, host="127.0.0.1", port=4318, inprocess_cmd=None, store=None, log_level="warning") -> None— Start uvicorn onhost:portand block until SIGINT/SIGTERM. Prints a singleOpenRAL dashboard: http://host:port/banner to stderr before binding (issue #132) so the user always sees the URL. Wheninprocess_cmdis set, spawns the argv as a child process withOTEL_EXPORTER_OTLP_ENDPOINT+OTEL_EXPORTER_OTLP_PROTOCOL=http/protobufpointed at the dashboard. Default port is4318(OTLP/HTTP standard) instead of the historic8000to avoid clashing withmkdocs serve/python -m http.server. (L56)spawn_dashboard(*, host="127.0.0.1", port=4318, ready_timeout_s=10.0) -> Iterator[str | None][@contextmanager] — Inverse of--inprocess: spawnopenral dashboardas a child of the current process, poll/healthzuntil ready, setOTEL_EXPORTER_OTLP_{ENDPOINT,PROTOCOL}, yield the URL, and SIGINT the child on exit. YieldsNone(workload continues unattached) ifopenralis not on PATH, the child died early, or/healthznever came back within the timeout. (L36, inopenral_observability/dashboard/attach.py)attached_dashboard(*, enabled, port=4318) -> Iterator[bool][@contextmanager] — High-level wrapper used byopenral sim run --dashboard,openral deploy run --dashboard, andopenral benchmark run --dashboard. Whenenabled=False, yieldsFalseimmediately (true no-op, no FastAPI/uvicorn imports). Whenenabled=True, delegates tospawn_dashboard, re-runsconfigure_observabilityon the new endpoint, and drains viashutdown_observabilityinfinallyso the last batch lands before the child is SIGINT'd. YieldsTrueiff the child reported healthy.
python/observability/src/openral_observability/dashboard/store.py — F7 trace index additions
ADR-0018 F7 — bounded per-trace_id span index for query-time bag↔OTel join.
class _IndexedSpan(L175) — Frozen-ish record retained bytrace_id:name,trace_id,span_id,parent_span_id,start_ns,end_ns,attrs,status_code,status_message,events..to_json()returns a plain dict carryingduration_ms.TelemetryStore.list_traces() -> list[dict]— One row per indexed trace_id (trace_id,span_count,last_seen_unix), most-recent first. BacksGET /api/traces.TelemetryStore.lookup_trace(trace_id: str) -> list[dict] | None— Every indexed span fortrace_id, sorted ascending bystart_unix_ns.Nonewhen the trace is not (or no longer) in the bounded index. BacksGET /api/spans/{trace_id}._TRACE_INDEX_MAX_TRACES = 64/_TRACE_INDEX_MAX_SPANS = 2048— Memory caps. Older trace_ids evict FIFO on insertion.
python/observability/src/openral_observability/dashboard/app.py — F7 routes
GET /api/traces— JSON{"traces": [...]}fromTelemetryStore.list_traces.GET /api/spans/{trace_id}— JSON{"trace_id", "spans": [...]}fromTelemetryStore.lookup_trace; 404 when the trace is not indexed.GET /api/config— JSON{"jaeger_ui_url": "..."}sourced from theOPENRAL_JAEGER_UI_URLenv (trailing slash stripped, default""). The dashboard UI fetches this on load to decide whether to enable the footer "open in jaeger" link — leaving the env unset keeps the link disabled with a helpful tooltip instead of producing a broken-link click against a guessedlocalhost:16686.
python/observability/src/openral_observability/tracing_lttng.py
ADR-0018 F9 — opt-in LTTng tracepoints around the realtime hot path. No-op when OPENRAL_ROS2_TRACING is unset; falls back to JSONL when lttngust is missing.
ENV_TRACING_GATE = "OPENRAL_ROS2_TRACING"— Truthy values (1/true/yes/on) enable the backend; anything else leaves every tracepoint a no-op.ENV_TRACING_FALLBACK_DIR = "OPENRAL_ROS2_TRACING_FALLBACK_DIR"— Override for the JSONL fallback directory (default/tmp/openral-lttng-fallback).TP_RUNNER_TICK,TP_HAL_READ_STATE,TP_HAL_SEND_ACTION,TP_SENSORS_READ_LATEST,TP_WORLD_STATE_SNAPSHOT,TP_SKILL_STEP,TP_ACTION_PUBLISH,TP_SAFETY_VALIDATE— Tracepoint base names;lttng_tracepointappends_begin/_endsuffixes.is_enabled() -> bool(L112) — Single source of truth for the gate.lttng_tracepoint(name, **attrs) -> Iterator[None](L175) — Context manager that fires<name>_begin/<name>_endaround the block. Attaches the active OTeltrace_idasotel_trace_idso CTF traces can join back to OTel.class LttngSession(name, output_dir)(L93) — Identity of an active session.class LttngSessionError(RuntimeError)(L88) — Raised by the subprocess wrappers.start_session(*, name, output_dir) -> LttngSession(L321) —lttng create / enable-event openral:* / add-context / start.stop_session(*, name) -> None(L346) —lttng stop+destroy(flush + teardown).view_session(*, output_dir) -> None(L358) —babeltrace2 OUTPUT_DIR; falls back to listing files whenbabeltrace2is absent.
python/dataset/src/openral_dataset/recorder.py
ADR-0019 — in-memory per-rollout accumulator with multi-sink fan-out.
@dataclass class EpisodeHeader(episode_idx, task_string, fps, robot_name, stamp_ns)— Per-episode metadata pushed to sinks atepisode_start. (L58)@dataclass class DatasetFrame(episode_idx, frame_idx, observation_state, images, action, reward, terminated, truncated, stamp_ns, trace_id="", span_id="")— Per-tick frame pushed to sinks atrecord_frame.trace_id(32 hex) /span_id(16 hex) carry the producingrskill.tickspan's ids (ISSUE-109 forward link);""when no valid span was in scope. (L81)@dataclass class EpisodeSummary(episode_idx, success, n_frames, stamp_ns)— Per-episode close-out pushed to sinks atepisode_end. (L122)class DatasetSink(Protocol)— Fan-out target withopen_episode/write_frame/close_episode/finalize. (L140)class RolloutRecorder(*, robot, task_string, fps, sinks, repo_id=None)— In-memory accumulator that fans every step out to one or moreDatasetSinkimplementations and writes the OTelopenral.dataset.repo_id/episode_idx/frame_idxattributes on the activerskill.tickspan. (L167)episode_start(*, task_string=None) -> int— Open a new episode; returns its idx. (L296)record_frame(*, observation_state, images, action, reward, terminated, truncated, stamp_ns, trace_id=None, span_id=None) -> int— Append one frame. Captures the activerskill.tickspan's(trace_id, span_id)onto the frame (ISSUE-109); explicittrace_id/span_idoverride the live capture (the offline converter replays the bag's original ids). (L341)episode_end(*, success: bool) -> EpisodeSummary— Close the current episode. (L454)finalize() -> None— Flush all sinks idempotently. (L487)- prop
fps,robot_name,repo_id,n_sinks,expected_state_shape— Read-only views consumed by callers building the per-frame payload. (L222) expected_image_keys() -> tuple[str, ...]— Camera keys (withoutobservation.images.prefix) the sinks expect; derived fromRobotDescription.sensors[*].vla_feature_key. (L256)
python/dataset/src/openral_dataset/schema_map.py
Pure RobotDescription → LeRobot v3 features dict mapping; no I/O, no lerobot import.
@dataclass class FeatureSpec(key, dtype, shape)— Decoupled feature descriptor; sinks translate to lerobot's{'dtype', 'shape', 'names'}format. (L45)features_from_robot(robot: RobotDescription, *, fps: float) -> dict[str, FeatureSpec]— Build the LeRobot v3 features dict for the recorder. ReadsObservationSpec.state_shape,ActionSpec.dim, andSensorSpec.vla_feature_key(image modalities only) from the robot manifest. (L62)
python/dataset/src/openral_dataset/bag.py
ADR-0019 PR3 — mcap-backed :class:DatasetSink for online hardware recording.
Rosbag2Sink(*, bag_path, compression="zstd")— Writes everyRolloutRecorderevent into an mcap file readable byros2 bag info/ Foxglove / mcap-cli. Daemon writer thread + boundedqueue.Queue→write_frameenqueues only; hot path never blocks on disk I/O. JSON-schema encoding (interoperable with ROS 2'sros2msgencoding for the same topics). Topics:/openral/tick(per-tick metadata),/openral/episode(PHASE_START / PHASE_END markers). (L143)open_episode(header) -> None— Open the bag on first call; emit Episode(PHASE_START). (L237)write_frame(frame) -> None— Enqueue a Tick message (incl. the frame'strace_id/span_id, ISSUE-109); never blocks. The off-thread mcap write reads the ids off the frame because the OTel context is gone by then. (L251)close_episode(summary) -> None— Emit Episode(PHASE_END) with success flag. (L282)finalize() -> None— Drain queue, stop writer thread, close mcap. Idempotent. (L292)- prop
bag_path,n_ticks_written,n_episode_markers_written,n_dropped— Diagnostics. (L216) TOPIC_TICK,TOPIC_EPISODE,PHASE_START,PHASE_END— Module-private constants the PR4 converter imports by symbol. (L65, L66, L111, L112)
python/dataset/src/openral_dataset/converter.py
ADR-0019 PR4 — offline mcap rosbag2 → LeRobotDataset v3 converter.
@dataclass class DatasetSummary(output_root, n_episodes, n_frames, n_success, repo_id)— Returned byfrom_bagdescribing what landed on disk. (L68)Rosbag2ToLeRobotConverter.from_bag(*, bag_path, robot, output_root, repo_id=None, license="CC-BY-4.0", fps=None) -> DatasetSummary— Walk aRosbag2Sink-produced mcap, group Ticks under PHASE_START / PHASE_END markers, replay each episode through a realLeRobotDatasetSink→ produce a reloadable v3 dataset. Each replayed tick re-injects the bag's original(trace_id, span_id)so the on-disk frame points at the source rollout, not the convert run (ISSUE-109). RaisesROSConfigErroron missing bag / missing episode markers / mismatched robot. (L117)
python/dataset/src/openral_dataset/frame_trace.py
ISSUE-109 — pivot a written LeRobotDataset frame back to its OTel ids.
read_frame_trace(*, root, episode_idx, frame_idx) -> tuple[str, str]— Return the(trace_id, span_id)stamped on a v3 frame. Reads theroot/data/**/*.parquetcorrelation columns directly viapyarrow(no video decode), so it works without a torchcodec/ffmpeg backend. RaisesROSConfigErrorwhen the root has no parquet, the dataset predates the columns, or no(episode_idx, frame_idx)row matches. Backsopenral replay --frame. (L28)
python/dataset/src/openral_dataset/sinks.py
LeRobotDataset v3.0 (codebase_version="3.0") writer; deferred LeRobotDataset.create so per-camera shapes come from the first frame.
class LeRobotDatasetSink(DatasetSink)— Implementation ofDatasetSinkwriting LeRobot v3 datasets via reallerobot.datasets.LeRobotDataset.create / add_frame / save_episode / finalize. Lazy-imports lerobot at construction. (L93)__init__(*, root, robot, fps, repo_id=None, license="CC-BY-4.0", vcodec="libsvtav1")— RaisesROSConfigErrorif lerobot ≥ 0.5.1 is not importable. (L129)open_episode(header) -> None— Stash the task string for per-frame tagging. (L257)write_frame(frame) -> None— Validates per-frame shapes against the declared features, then forwards toLeRobotDataset.add_frame. Adds the frame'strace_id/span_idasstringparquet columns (ISSUE-109). (L276)close_episode(summary) -> None— CallsLeRobotDataset.save_episode(parallel_encoding=True)and accumulates the per-dataset success counter. (L353)finalize() -> None— CallsLeRobotDataset.finalize()then appendsdataset_success_rate/license/repo_idand the dataset-leveltrace_ids/n_traces(distinct OTel traces, ISSUE-109) tometa/info.json["metadata"], and writes the per-episodeepisode_index → trace_idmap to themeta/openral_traces.jsonsidecar. (L396)