ADR-0037: GStreamer perception bus — reasoner-activated tee consumers + object-detection rSkill
- Status: Proposed
- Date: 2026-06-03
- Related: ADR-0010 (inference runner — the GStreamer sensor pipeline / NVMM reader, PR I);
ADR-0018 (reasoner tool palette + the F6 perception tee /
/openral/perception/<kind>); ADR-0024 (the "result-only, noActionemitted" rSkill pattern this reuses); ADR-0022 (rSkill palette metadata); CLAUDE.md §3 (layer boundaries; dual-system), §1.5 (hot path is C++ and bounded), §1.11 (real components), §9 (license lineage). Builds on theTensorRTRuntimeONNX→TRT backend (PR #223).
Context
We want an object-detection rSkill the S2 reasoner can activate/deactivate, that runs on the GPU with zero GPU↔CPU copies, teeing off the camera GStreamer pipeline after frames are already on the GPU. Investigating the path surfaced three structural facts:
- rSkills do not consume the GStreamer pipeline at all today. The skill observation path is
CPU numpy delivered over ROS image topics: a camera/HAL node publishes
sensor_msgs/Image→ theworld_statenode callsWorldStateAggregator.update_image_frame→ the skill reads it insnapshot(). The skill_runner's policy-adapter observation assembly drops any GPU handle (if frame.data is None: continue, thennp.frombuffer), so only CPU pixels reach the policy. The NVMM zero-copy reader (ADR-0010) exists but itsbh_sinkpolicy leg is effectively unconsumed by skills. - A CUDA device pointer / NVMM surface is process-local. Zero-copy GPU inference can only
happen in the process that owns the pipeline's CUDA context. The current
skill_runnerprocess is not that process (the pipeline, when run, lives in a separate sensor process that publishes BGR-over-ROS — a copy at that boundary). - The detector is the missing producer the spatial-memory work needs. PRs #217–#220 (the
persistent 3D spatial-memory + reasoner-query stack) each close by noting that no node fills
WorldState.detected_objectsfrom live perception. This rSkill is that producer.
The product goal is broader than one skill: all perception/policy consumers (VLAs, the object detector, other ROS algorithms) should tee off one camera pipeline, and the reasoner should add and remove those tees on demand. That is the north star this ADR commits the architecture to, while scoping the first increment to the object detector.
DeepStream is available opt-in and EULA-gated via docker/inference/Dockerfile.x86 ds-on
(WITH_DEEPSTREAM_STAGE=on, DeepStream 9 / CUDA 13), which provides nvinfer + NVMM caps on
x86; it is not bundled in open-core. (The "no nvvideoconvert on x86" comments in the
pipeline builder predate the ds-on stage and are stale.)
Decision
- A GStreamer "perception bus." One pipeline per camera owns the GPU frames. The
GStreamerSensorReaderis co-located in theruntime_nodeprocess (alongsideworld_state+skill_runner, sharing one CUDA context), so in-process consumers can read NVMM zero-copy. Consumers attach as tee branches; results leave the pipeline only as typed ROS messages (ObjectsMetadata,candidate_action), never GPU buffers across process lines. - A
TeeManagerperforms dynamic pad add/remove on the live pipeline.attach(branch) → handlerequests ateesrc pad, installs a blocking pad-probe, linksqueue leaky=downstream ! <branch> ! appsink, syncs state, unblocks;detach(handle)blocks, unlinks, releases the request pad, tears the branch down. Each branch gets its ownqueue leaky=downstream max-size-buffers=2so a stalled/crashing consumer never backpressures the policy leg (the ADR-0018 §3 isolation invariant). The policy and ROS-observability legs are never disturbed. - The reasoner controls tees through the existing
ExecuteSkillverb — no newReasonerToolCallvariant. Activating a detector rSkill is "attach a tee"; its cancel/deadline/deactivate is "detach." The palette, license guard, registry refresh, and replanning ladder are reused unchanged. (The stubbedReloadGstPipelineTool/ GH-126 is untouched and out of scope.) - A new rSkill
kind: "detector". It declaresruntime: tensorrt(ships ONNX; the engine is built on load byTensorRTRuntime, PR #223),role: s1(required for palette inclusion —build_tool_palettefilters tos1; it carries no actuation authority regardless of slot), a class/output contract, and a latency budget; it forbids the VLAmodel_family/ action contract. Itsstep()publishesObjectsMetadataand returns noAction— modeled on the ADR-0024 "result-only" mode — so the safety kernel / HAL are never driven by it. - Two-tier inference, selected by a capability probe (mirroring
detect_platform): - Zero-copy NVMM tier. When DeepStream is present, the official
nvinferelement runs on the NVMM buffer. Without DeepStream but with NVMM (Tegra), a clean-room, Apache-2.0GstBase.Aggregatorelement consumes theNvBufSurfacedevice pointer on the shared CUDA context. (Technique informed by, never copied from, the proprietaryvideotechreference — §9.) - CPU fallback tier. On hosts without NVMM (x86 open-core without DeepStream, and
deploy sim), a system-memory BGR branch runs ONNXRuntime. Not zero-copy, but it is the testable path on GPU-less CI and givesdeploy sima working detector (§1.11 — real components, no mocks). Zero-copy applies only to in-process consumers; "other ROS algorithm" consumers in separate processes attach to a tee terminating in an appsink published over ROS — a deliberate, rate-limited copy at that boundary. - Output contract: reuse the existing 2D schemas. Detections publish as
ObjectsMetadata{ detections: list[ObjectDetection2D], model_id }on/openral/perception/objects(already consumed by the reasoner). 2D only in this epic; the 2D→3DDetectedObjectpose-lift (needs depth + camera intrinsics/extrinsics via TF2) →WorldState.detected_objects→ spatial-memory ingest is a separate, later decision, coordinated with the spatial-memory PRs. - The skill runner stays single-active for now. No concurrent rSkills are required yet (a
detector and a VLA do not run simultaneously).
TeeManageris nonetheless built to hold multiple branches so a future concurrency increment needs no redesign.
Alternatives considered
- Full pipeline rebuild on activate (the
ReloadGstPipelineToolintent) — rejected as the attach mechanism: it tears down + rebuilds the reader, dropping frames and blacking out the policy leg for seconds on every activate/deactivate. Unacceptable mid-task. - Valve-gated dormant leg (build the detection branch at startup behind a closed
valve drop=true, toggle on activate) — simpler and deadlock-free, but keeps the engine resident in VRAM while dormant and does not literally "attach." Rejected in favor of dynamic pads, which match the "tee on demand, close when done" intent and keep idle cost at zero. - A "fat" rSkill consuming the NVMM handle in the
skill_runnerprocess across the ROS boundary — rejected: CUDA pointers are process-local, so this would require CUDA-IPC handle passing (fragile) or a copy (defeats the goal). Co-locating the pipeline inruntime_node(Decision 1) is what makes in-process zero-copy possible. - Modeling the detector as
kind: vlaorkind: ros_action— rejected:vlarequiresmodel_family+weights_uriand is an action-chunk policy (the detector ships ONNX weights but emits no actions), andros_actionforbids weights and is action-server shaped. A detector emits perception events and owns ONNX weights; neither existing kind fits. - A new
ReasonerToolCall"attach detector tee" variant — rejected as redundant:ExecuteSkillalready expresses "run this rSkill," and lifecycle (cancel/deadline/deactivate) already expresses "stop it." Adding a parallel verb would fork the palette/replanning machinery.
Consequences
- Layer touch (ADR-gated, hence this ADR). Layer 1 (the GStreamer pipeline gains a
TeeManagerand moves into theruntime_nodeprocess) ↔ Layer 3 (a new rSkill kind) ↔ the reasoner palette (Layer 4 surfaces the detector viaExecuteSkill). The detector (Layer 3) driving a tee on the pipeline (Layer 1) is the boundary crossing this ADR authorizes. - Schema change.
openral_core.RSkillManifestgains akind: "detector"discriminated variant. On-diskschema_versionstays"0.1"(no migrator pre-publish, CLAUDE.md §6); the repo state map anddocs/METHODS.mdupdate in the implementing PR. - Builds on PR #223.
TensorRTRuntime(ONNX→TRT build-on-load) is the detector's runtime; the NVMM device-pointerinferentry point is added by the detector-element PR. - License posture (§9).
nvinferlives in EULA-gated DeepStream (opt-in image, not bundled); the open-core NVMM element is clean-room Apache-2.0; detector weights are RT-DETR / D-FINE (Apache-2.0); LibreYOLO is used build-time only to export ONNX, never in the hot path. - Delivered as a 6-PR epic (smallest-viable PRs, §4.2): (1) refactor the pipeline builder to a
named-tee structure; (2) this ADR; (3)
TensorRTRuntime(PR #223, done); (4) co-locate the reader inruntime_node+TeeManager; (5) the two-tier detector element; (6) thekind: detectorrSkill package. - Follow-ups (own PRs / later decisions): the 2D→3D pose-lift + spatial-memory ingest;
migrating VLAs to consume NVMM tees in-process (retiring the ROS-image CPU path for policies);
deploy simGStreamer parity via anappsrc-fed pipeline; concurrent rSkills; a configurable per-skill TRT optimization profile; INT8 calibration.
Amendment — 2026-06-03 (PR C4: NVMM aggregator tier wired end-to-end; ADR filed as PR 2)
What landed (PR5b / C1–C4). The clean-room NVMM zero-copy detector tier is fully wired:
TrtNvmmExecutor(openral_runner.backends.gstreamer.trt_nvmm) — a clean-room CUDA RGBA→planar-NCHW kernel (rgba_to_nchw_norm) is compiled at load via nvrtc to a SASS cubin for the runtime-detected GPU arch, launched via the cuda-python (cuda.bindings) driver API directly on the NVMMdataPtr, writing into the TensorRT input device buffer; the engine then runs in the CUDA primary context (no GPU→CPU copy). No pycuda / no nvcc / no g++ — the tier is deployable in the leands-onruntime image (see "Deployability finding").NvmmObjectsDetector(openral_runner.backends.gstreamer.nvmm_detector) — wrapsTrtNvmmExecutor, appliespostprocess_rtdetr, and emitsObjectsMetadata.DetectorRunnerNVMM_AGGREGATOR branch (openral_runner.backends.gstreamer.detector_runner) — ontier=DetectorTier.NVMM_AGGREGATOR,DetectorRunnerdynamically attaches a<nvmm_convert_element> ! video/x-raw(memory:NVMM),format=RGBA,width=W,height=H ! appsinkbranch to the namedopenral_cam_tee, and runsNvmmObjectsDetectoron every NVMM buffer delivered to that appsink.nvmm_convert_element()(pipeline.py) resolves the NVMM-aware colour-convert element at runtime:nvvideoconvertwhen DeepStream (ds-on / DS9) is installed;nvvidconvon Tegra/L4T;Nonewhen neither is available (CPU-only hosts).
Validation environment. The aggregator tier is validated in the ds-on (DS9) container
(docker/inference/Dockerfile.x86, WITH_DEEPSTREAM_STAGE=on, DeepStream 9 / CUDA 13), where
nvvideoconvert + libnvbufsurface.so are available on x86. Both gated integration tests in
tests/integration/test_nvmm_aggregator_ds.py pass for real inside ds-on
(test_nvmm_buffer_flow_to_wrap_buffer: live NVMM buffer → wrap_buffer → valid GPU dataPtr;
test_nvmm_aggregator_emits_objectsmetadata: full nvvideoconvert → NVMM → nvrtc kernel → TRT →
ObjectsMetadata), and skip automatically on hosts without the DeepStream stack. The host GPU
tests (test_trt_nvmm_executor, test_nvmm_detector) cover the kernel + TRT-on-dataPtr path.
Deployability finding (why nvrtc, not pycuda). Container validation showed the lean ds-on
runtime image ships CUDA runtime libs + libnvrtc.so but no g++, no nvcc, no CUDA dev
headers — so pycuda cannot install (its C-extension build needs g++/cuda.h) and
pycuda.SourceModule (nvcc-based runtime compilation) cannot run there. The executor therefore
compiles the kernel via nvrtc (a runtime library already in the image) and uses cuda-python
(a pure wheel, already a tensorrt-group dependency) for memory/stream/module/launch, dropping
pycuda + the get_shared_cuda_context() dependency for this path. The DeepStream NVMM dataPtr
is accessible from the CUDA primary context (validated by a direct cudaMemcpy), so no explicit
shared context is needed. reader.py's NVMM path still uses pycuda via cuda_context and is
unaffected (separate, currently-unconsumed path).
Other validation fixes (PR5b). (a) NVMM GstBuffers map read-only, so the surface struct
is read via ctypes.from_buffer_copy (copies only the small NvBufSurface metadata, not the GPU
frame — DetectorRunner._on_sample_nvmm); the original from_buffer raised "underlying buffer is
not writable". (b) docker/inference/Dockerfile.x86 COPY examples/ was stale (the dir was
reorganized away in d450e55), breaking the build — removed in a prior fix(docker) commit.
nvinfer tier status — deferred. The Decision §5 nvinfer tier (DeepStream-native inference
element) is deferred, not implemented: no nvinfer code is merged. The go/no-go spike was not
run because (i) the clean-room nvrtc TrtNvmmExecutor aggregator is the working, deployable tier,
and (ii) the nvinfer GStreamer plugin does not register in the built ds-on image
(gst-inspect-1.0 nvinfer fails while nvvideoconvert resolves), so a spike would first require
diagnosing that plugin-load failure. nvinfer remains a documented future option (an alternative
that would attach detection metadata natively / use NVIDIA-maintained inference); revisiting it is
a separate follow-up. The open-core NVMM aggregator tier operates entirely without nvinfer.
Amendment — 2026-06-09 (PR: open-vocabulary VLM detector tier — LocateAnything-3B)
What this adds. A fourth detector tier, DetectorTier.VLM_SIDECAR, lets a kind: detector
rSkill be backed by an open-vocabulary visual-grounding VLM instead of a fixed-label ONNX model.
The first such skill is rskills/locateanything-3b-nf4 (nvidia/LocateAnything-3B: MoonViT vision
tower + Qwen2.5-3B; emits <ref>label</ref><box><x1><y1><x2><y2></box> tokens normalized to
[0,1000]). It is selected for manifests with runtime: pytorch.
Why a sidecar (not in-process). The model's trust_remote_code modeling files require
transformers==4.57.1; the openral runtime is transformers>=5 (CLAUDE.md §3), which removed/renamed
the APIs the custom code calls (config.rope_theta, GenerationMixin inheritance, the
_check_and_adjust_attn_implementation signature). It therefore runs out-of-process in an isolated
transformers==4.57.1 venv — the same version-isolation pattern as the RLDX-1 sidecar
(tools/rldx_sidecar.py). The process boundary is the sole permitted test double (CLAUDE.md §1.11).
What landed.
- tools/_locateanything_server.py — thin ZMQ/msgpack inference server (runs inside the sidecar
venv): loads the model once with bitsandbytes NF4 (load_in_4bit, nf4, bf16 compute,
double-quant), and returns the model's raw grounding text. On an 8 GB RTX 4070: 2.9 GB resident,
~3.5 GB peak.
- tools/locateanything_sidecar.py — boot helper: provisions the venv (pinned to the
nvidia/LocateAnything Space requirements + ZMQ deps) and execvpes the server, dropping
PYTHONPATH/PYTHONHOME so the runtime's transformers wheels can't shadow it.
- LocateAnythingDetector (openral_runner.backends.gstreamer.locateanything_detector) — the
ZMQ client. Exposes the same detect(frame_bgr, width, height, sensor_id) -> ObjectsMetadata
interface as ObjectsDetector, so it reuses the CPU_ONNX system-memory BGR appsink branch
unchanged. Connection is lazy (first detect()), with an auto-spawn lifecycle mirroring the RLDX
adapter (ping → Popen → poll → close). All <ref>/<box> parsing, degenerate-box filtering,
and ObjectsMetadata construction are pure functions in the main env (parse_grounding_answer,
build_objects_metadata) — unit-testable without a GPU.
- build_manifest_detector (openral_runner.backends.gstreamer.detector_factory) — a gi-free
dispatch seam: runtime: pytorch → LocateAnythingDetector + VLM_SIDECAR; onnx/tensorrt →
the existing ONNX detectors via make_objects_detector. DetectorRunner delegates to it and
onnx_path is now optional (None for the VLM tier).
Open-vocabulary query (static default + dynamic override). Unlike the fixed-label ONNX
detectors, the query is free text. The static default is the manifest's detector.labels (joined
for the grounding prompt); LocateAnythingDetector.set_query(text) overrides it at runtime. The
dynamic override is a perception-side channel, NOT the actuation path: detectors are
deliberately excluded from the ExecuteRskillTool palette (palette.py — "perception producers,
not actuators"), so goal_params_json never reaches a detector. Instead RosImageObjectDetectorNode
subscribes a std_msgs/String topic (/openral/perception/detector_query) and calls set_query on
each message (and honours an initial query param). The S2 reasoner retargets the detector by
publishing to that topic; a typed SetDetectorQueryTool reasoner-palette variant is the remaining
follow-up.
deploy-sim integration. RosImageObjectDetectorNode is now manifest-driven: with a
manifest_path it builds its backend via build_manifest_detector (RT-DETR ONNX or the VLM
sidecar). openral deploy sim --object-detector-manifest rskills/locateanything-3b-nf4/rskill.yaml
[--object-detector-query "red mug"] brings the VLM detector up on the camera tee (auto-enables the
leg, no ONNX file needed; throttled to ~0.5 Hz since each VLM frame is ~1–2 s). The legacy RT-DETR
path (no manifest_path) is byte-for-byte unchanged.
Confidence semantics. LocateAnything is a grounding model: it emits boxes without per-box
scores, so every ObjectDetection2D is reported at confidence=1.0 and score_threshold does not
apply. Recorded honestly rather than fabricating a score (CLAUDE.md §1.2).
Schema. No RSkillManifest change: runtime: pytorch was already a valid RSkillRuntime and
the kind: detector validator already accepts it. DetectorContract.input_size is unused by the
VLM tier (LocateAnything resizes dynamically); it is kept only because the contract requires it.
Hardware note. The switzerchees/LocateAnything-3B-NVFP4 variant is Blackwell-only (NVFP4 has
no Ada/Lovelace support); NF4 bitsandbytes is the path for ≤8 GB Ada GPUs. The
Reza2kn/...-ONNX-WebGPU-INT4 variant targets browser WebGPU and is out of scope for this tier.
Validation. tests/unit/test_locateanything_detector.py — pure parsing/conversion + manifest
validation + dispatch tests always run; a GPU+sidecar-gated e2e boots the real sidecar and grounds
"cat" on the real coco_sample.jpg fixture (≥2 detections), confirms "person" yields none, and
exercises the dynamic set_query override. Gated on OPENRAL_LOCATEANYTHING_SIDECAR_VENV + a local
GPU (the legitimate CI skip path, CLAUDE.md §12).
Follow-ups. (i) a typed SetDetectorQueryTool reasoner-palette variant that publishes to
/openral/perception/detector_query (the topic + node-side set_query already land here; this adds
the LLM-facing tool so the reasoner can retarget the detector mid-task); (ii) publish the pre-packed
NF4 weights to OpenRAL/rskill-locateanything-3b-nf4 so the sidecar can load them directly instead
of quantizing upstream on each boot.
Amendment — 2026-06-12 (PR: in-process OmDet-Turbo zero-shot detector — engine: zeroshot_hf)
What this adds. A fifth detector tier, DetectorTier.ZEROSHOT_HF, lets a kind: detector
rSkill be backed by a first-class Transformers open-vocabulary detector
(AutoModelForZeroShotObjectDetection) run in process over a fixed class vocabulary. The
first such skill is rskills/omdet-turbo-indoor (omlab/omdet-turbo-swin-tiny-hf: OmDet-Turbo,
Swin-tiny backbone; Apache-2.0, commercial-safe — CLAUDE.md §1.9), configured with a curated
~266-class indoor vocabulary (kitchenware, tableware, appliances, furniture, tools, containers).
Why this tier, and why not the existing ones. The two RT-DETR rSkills are fixed-label ONNX
detectors capped at the 80 COCO classes — most of which (zebra, surfboard, airplane) never appear
indoors, while indoor staples (mug, drawer, switch, jar, pan) are absent. locateanything-3b-nf4
covers open vocabulary but (a) its weights are NVIDIA non-commercial, and (b) it is query
driven — it grounds a prompt, not "whatever is in the scene". The goal here is to populate the
world object list with far more than 80 classes from an unprompted background producer under a
permissive license. OmDet-Turbo fits: a large fixed vocabulary, evaluated every frame, behaves
like a closed large-vocabulary detector while staying Apache-2.0 and real-time.
Why in-process (not a sidecar). Unlike LocateAnything (pinned to transformers==4.57.1, hence
the out-of-process venv), OmDet-Turbo is a core transformers architecture that loads under the
runtime's own transformers>=5. It therefore runs in the runner process — no sidecar venv, no ZMQ.
torch/transformers are lazy-imported (the omdet dependency group) so ONNX-only runners are
unaffected.
Fixed vocabulary, unprompted (the contrast with LocateAnything). This backend deliberately has
no set_query and the detector node does not subscribe it to
/openral/perception/detector_query. The vocabulary is the manifest detector.labels, fixed for
the life of the skill; the reasoner does not retarget it. That is the property that makes it a
"runs in the background, builds the object list" producer rather than a grounding tool.
What landed.
- DetectorEngine (openral_core.schemas) — a manifest-level backend discriminator on
DetectorContract.engine (rtdetr_onnx / vlm_sidecar / zeroshot_hf). None (default)
preserves the legacy runtime-keyed dispatch, so the existing rtdetr + locateanything manifests
are untouched; the new rSkill sets engine: zeroshot_hf explicitly. This disambiguates two
backends that share runtime: pytorch (the VLM sidecar and the in-process zero-shot detector).
- OmDetTurboDetector (openral_runner.backends.gstreamer.omdet_turbo_detector) — loads the
processor + model on first detect() (CUDA when available, CPU fallback), runs the fixed
vocabulary via processor.post_process_grounded_object_detection, and builds ObjectsMetadata.
All score-thresholding / clipping / degenerate-box filtering is a pure function
(build_objects_metadata_from_results), unit-testable without torch or a GPU. Exposes the same
detect(frame_bgr, width, height, sensor_id) -> ObjectsMetadata | None interface as
ObjectsDetector, so it reuses the CPU_ONNX system-memory BGR appsink branch unchanged.
- build_manifest_detector dispatch gains an engine: zeroshot_hf branch → OmDetTurboDetector
+ DetectorTier.ZEROSHOT_HF (checked before the legacy runtime branches). DetectorRunner and
RosImageObjectDetectorNode route through it generically (uniform detect() interface), so no
GStreamer-branch or node changes are needed beyond the new tier reusing the system-memory leg.
Schema. DetectorContract gains an optional engine: DetectorEngine | None = None field
(on-disk schema_version stays "0.1", no migrator — CLAUDE.md §6). The kind: detector validator
is unchanged: weights_uri still required (here hf://omlab/omdet-turbo-swin-tiny-hf).
Validation. tests/unit/test_omdet_turbo_detector.py — pure conversion (threshold / sort / clip
/ degenerate-box / length-mismatch), manifest validation, in-process dispatch (lazy build, no torch),
the omdet dependency-group guard, and a regression check that the new engine branch does not shadow
the legacy ONNX dispatch — all run without a GPU. A GPU-gated e2e loads the real Apache-2.0 weights
and grounds indoor classes on coco_sample.jpg (skipped on GPU-less hosts, CLAUDE.md §12).
Follow-ups. (i) embedding-cache the fixed class text so the per-frame cost is the vision forward
only; (ii) mirror the same engine: zeroshot_hf tier for grounding-DINO / OWLv2 if a
higher-accuracy (non-real-time) variant is wanted; (iii) wire the deploy sim throttle defaults for
the larger vocabulary.
Amendment (2026-06-15): omdet-turbo-indoor is the deploy-sim default detector
openral deploy sim previously defaulted to the fixed-label RT-DETR COCO ONNX (rtdetr-coco-r18)
and auto-enabled the detector leg only when those weights happened to exist on disk. COCO-80 has no
bread/baguette (or most household objects), so the default detector was structurally blind to the
objects deploy-sim scenes actually contain — a deploy sim run would silently bring up a detector
that could never ground the target object. This amendment changes the default, not the
contract:
- Default backend =
omdet-turbo-indoor(open-vocabengine: zeroshot_hf, 266 indoor/kitchen classes incl.bread) when its runtime deps (transformers+timm, theomdetgroup) are importable, resolved byopenral_cli.deploy_sim._omdet_runtime_available(). Graceful fallback to the in-tree RT-DETR COCO ONNX when they are not, so a base-group checkout still comes up. - Detector on by default. The CLI flag is now
--object-detector/--no-object-detector(was--enable-object-detector/--no-enable-object-detector, auto-by-weights). The leg auto-downgrades to off (with a console notice) only when neither backend is available — no hard-fail at node build. This is the enable-default half of ADR-0035's detector leg. - Engine-aware throttle (resolves follow-up iii):
sim_e2e.launch.pycaps the publish rate byDetectorContract.engine—vlm_sidecar0.5 Hz,zeroshot_hf2 Hz, RT-DETR ONNX 5 Hz — instead of the priorruntime == "pytorch" → 0.5 Hzheuristic that throttled the in-process OmDet backend as slowly as the LocateAnything VLM sidecar.
Explicit --object-detector-manifest / --object-detector-onnx overrides are unchanged. No schema
change; on-disk schema_version stays "0.1". Arbitrary-object grounding (e.g. baguette) still
needs the on-demand open-vocab locator (locate_in_view, ADR-0043/0051) — a separate work item.