ADR-0051: Detector invocation mode — continuous background producers vs on-demand prompted locators
- Status: Accepted
- Date: 2026-06-12
- Related: ADR-0037 (the
kind: detectorrSkill, detector tiers, and the 2026-06-12zeroshot_hfamendment); ADR-0043 (the read-onlylocate_in_viewreasoner tool); ADR-0035 / ADR-0038 (world-state object lift + scene-graph memory); ADR-0018 §4 (tool palette); ADR-0050 (single-resident-skill VRAM eviction); CLAUDE.md §3 (layer boundaries — this crosses the detector contract ↔ reasoner palette boundary, hence this ADR).
Context
After ADR-0037 (2026-06-12 amendment) there are three perception detector rSkills, and they reach the reasoner through three different mechanisms that are currently assigned implicitly:
| rSkill | Vocabulary | How it runs | How the reasoner sees it |
|---|---|---|---|
rtdetr-coco-r18 / -v2-r50vd |
closed (80 COCO) | always-on tee producer | excluded from the ExecuteSkill palette; output → WorldState.detected_objects |
omdet-turbo-indoor |
frozen-open (266 fixed) | always-on tee producer | same — background producer |
locateanything-3b-nf4 |
open (any text) | both a static-default continuous leg and an on-demand service | surfaces the locate_in_view tool (gated by the runtime service existing) |
Two orthogonal axes are conflated into the model identity:
- Vocabulary — closed/fixed (RT-DETR, OmDet) vs open (LocateAnything).
- Invocation — continuous background producer (streams into world state; the reasoner reads it passively and never prompts it) vs on-demand prompted locator (the reasoner asks "find X now").
ADR-0037's DetectorEngine already names the
how-it-runs axis (rtdetr_onnx / vlm_sidecar / zeroshot_hf). The when-the-reasoner-invokes
axis has no typed contract — it is inferred from whether a /openral/perception/locate_in_view
service happens to be wired. Consequences:
- LocateAnything straddles both modes (continuous default-query leg and
locate_in_view), which is the root of the confusion. - OmDet-Turbo is an open-vocabulary model deliberately frozen into the background role, but nothing in its manifest declares that — the reasoner cannot tell it is not meant to be prompted.
- The reasoner has no typed view of which detector covers which vocabulary, so it cannot make the
one decision that matters: is
mugalready tracked continuously (read world state) or do I need to prompt the open-vocab locator forthe red stapler with the company logo?
Decision
- A typed
DetectorModeonDetectorContract.mode(openral_core.schemas) — the invocation axis, orthogonal toDetectorEngine: continuous(default) — an always-on background producer. Runs on the camera tee every frame, streamsObjectsMetadataintoWorldState.detected_objects; the reasoner reads it passively (world state /recall_object) and never prompts it. Not ExecuteSkill-dispatchable; no actuation authority. May still be toggled viaLifecycleTransitionToolto free VRAM (ADR-0050).on_demand— a prompted locator the reasoner invokes only when it needs a specific object now, via the read-onlylocate_in_viewtool (ADR-0043). Not run continuously.
engine × mode together give the reasoner a complete, typed mental model (how it runs × when
it is invoked) instead of inferring intent from a service's existence.
- Single responsibility per rSkill — mode is per-rSkill, not per-model. The manifests declare the split:
rtdetr-coco-r18,rtdetr-v2-r50vd,omdet-turbo-indoor→mode: continuous(the background bank: closed + frozen-open vocabularies, optimised for coverage/throughput).locateanything-3b-nf4,omdet-turbo-locator→mode: on_demand(the open-vocab locators, activated on demand and the natural candidates for the ADR-0050 load-on-prompt / evict-after VRAM lifecycle). LocateAnything (3B VLM) is the higher-quality option for complex referring expressions;omdet-turbo-locatoris a lightweight (~115M, real-time, in-process, Apache-2.0) alternative for simple "find X" queries.
The same OmDet-Turbo weights are packaged in both modes (omdet-turbo-indoor continuous /
omdet-turbo-locator on-demand) rather than one dual-mode rSkill — packaging two single-purpose
rSkills is what keeps the modes from straddling (the LocateAnything lesson). The
OmDetTurboDetector backend exposes set_query / detect_with_query so the on-demand package can
back the locate_in_view service; the detector node binds those by hasattr, so the same backend
serves either mode.
- Surface continuous coverage to the LLM (reasoner palette).
build_tool_palettecollects everymode: continuousdetector for the active robot intoToolPalette.continuous_detectors(ContinuousDetectorEntry: id + description + objects + scenes + label count — a compact coverage characterisation, not the full label list, to keep the prompt bounded). Whenlocate_in_viewis advertised (detector_available), its tool description is augmented with that coverage so the decision rule reaches the LLM at the point of choice:
object within a continuous detector's coverage → read world state /
recall_object; object outside it (novel / specific / attribute-qualified) →locate_in_view.
This cleanly separates "open-vocabulary" from "prompting": prompting is for the long tail the always-on bank does not cover.
- The perception node enforces
mode.RosImageObjectDetectorNoderesolves the manifest'smodeaton_configurevia the puredetector_node_wiringpolicy (openral_runner.backends.gstreamer.detector_factory): acontinuousdetector runs the primary camera's detect+publish leg and does not exposelocate_in_view/ subscribedetector_query; anon_demanddetector exposes thelocate_in_viewservice +detector_querytopic and does not publish continuously (frames are still cached so the service can answer). The legacy ONNX path (no manifest) iscontinuous.detector_available/scene_query_availableremain runtime flagsreasoner_nodesets from the live services (defence in depth).
Alternatives considered
- Overload
DetectorEngineto encode mode (e.g. avlm_sidecar_on_demandvalue) — rejected: conflates the two orthogonal axes the whole ADR exists to separate, and combinatorially explodes (any engine can in principle run in either mode). - Infer mode from the presence of the
locate_in_viewservice (status quo) — rejected: the intent is invisible to the typed contract, so the reasoner cannot reason about coverage, and a detector's role depends on deployment wiring rather than its manifest. - Dump the full label list into the palette — rejected: 80 + 266 labels per tick is prompt bloat; a coverage characterisation plus world-state lookup for specifics is sufficient and bounded.
- Make continuous detectors ExecuteSkill tools — rejected (and explicitly guarded against in
build_tool_palette): a detector emits perception, not actions; admitting it would let the LLM dispatch it as if it actuated the robot (ADR-0037 Decision 4).
Consequences
- Schema.
DetectorContractgainsmode: DetectorMode(defaultcontinuous); on-diskschema_versionstays"0.1"(no migrator — CLAUDE.md §6). Existing continuous detectors are unaffected by the default;locateanything-3b-nf4is set toon_demandexplicitly. - Reasoner palette.
ToolPalettegainscontinuous_detectors: tuple[ContinuousDetectorEntry, ...]; thelocate_in_viewtool description is coverage-aware. No new tool, no new dispatch path. - Layer touch (ADR-gated). Layer 3 (detector manifest contract) ↔ Layer 4 (reasoner palette / tool descriptions). No actuation path is touched; all surfaced tools remain read-only.
- Behavioural change. With node-side enforcement (Decision 4),
locateanything-3b-nf4— nowmode: on_demand— no longer publishes continuously; it answerslocate_in_viewonly. Deployments that want a continuous world-state producer should run acontinuousdetector (rtdetr-*oromdet-turbo-indoor) and use LocateAnything /omdet-turbo-locatoras the on-demand locator. (This supersedes the interim arrangement where LocateAnything's static-default continuous leg also published — e.g. PR #316's demo, which should migrate toomdet-turbo-indoorfor the continuous leg.) - Follow-ups. (i) ~~node-side
modeprovisioning~~ — done (Decision 4). (ii) tiemode: on_demandto the ADR-0050 load-on-prompt / evict-after VRAM lifecycle; (iii) feed the continuous coverage into the reasoner system prompt as well as the tool description.