ADR-0016: Multi-platform support — x86 (CUDA + CPU) and L4T (Orin + older Jetson)
- Status: Accepted
- Date: 2026-05-24
- Amended: 2026-05-24 (see Amendments below)
- Related: CLAUDE.md §1.11 (no mocks), §1.13 (METHODS reuse), §1.14 (docs travel with code), §6.1 (layer discipline), §7.10 (repo state map); ADR-0010 (Inference runner); ADR-0011 (NVMM handoff). Closes issue #89.
Numbering note. Issue #89's research write-up labelled this work "ADR-0012" before ADR-0012 (Licensing) landed in PR #87. The next free ADR slot is 0013, which this document takes; nothing else changes.
Context
PR #86 (feat(runner): GStreamerSensorReader (M8, ADR-0010 PR I))
is the first PR in the repo that explicitly recognises multiple
compute platforms at runtime. It ships:
- a
Platformenum (TEGRA / NVIDIA_DESKTOP / CPU_ONLY) atpython/runner/src/openral_runner/backends/gstreamer/pipeline.py:60, - a
detect_platform()probe atpipeline.py:172that reads/etc/nv_tegra_release, falls back togst-inspect-1.0fornvh264dec, then to CPU-only, - two Dockerfiles (
docker/inference/Dockerfile.x86,docker/inference/Dockerfile.l4t) plus anx86-rosvariant for the ROS-tee smoke test, Justfiletargetsdocker-build-x86,docker-build-l4t,docker-smoke-x86,docker-smoke-x86-ros.
This is the right shape for cross-platform support, but it stops at
the sensor-ingest layer. Other parts of the stack already have
platform-aware probes (_probe_jetson /
DTYPES_BY_COMPUTE_CAPABILITY / JETSON_BOARD_TOPS in
python/detect/src/openral_detect/probes/gpu.py,
auto_select_quant in
python/rskill/src/openral_rskill/quantization.py:72)
that were written in isolation, with no shared contract on which
targets the runtime supports or what "supported" means.
This ADR establishes that contract. It is the deliverable for PR 1/3 of issue #89; PR 2/3 (detection upgrades) and PR 3/3 (CI / Docker matrix) cite this ADR as their authority.
In scope
- x86 with NVIDIA dGPU (Turing / Ampere / Ada / Hopper / Blackwell).
- x86 CPU-only (no NVIDIA GPU).
- Jetson Orin (AGX / Orin NX / Orin Nano) on JetPack r36 / L4T 36.4.
- Generic Jetson — Xavier / Xavier NX — on the same
l4timage with degraded-compute capability (no BF16, no INT4 tensor-core path).
Out of scope (deferred to a future ADR if/when they land)
- Thor and Spark (Tegra Blackwell / NVFP4) — covered by the
existing
SensorReaderBackend.HOLOSCANreserved enum value (ADR-0010 Amendment 2026-05-12). - NVIDIA Holoscan SDK as a sensor-ingest backend (same amendment).
- DeepStream SDK — proprietary; disqualified by CLAUDE.md §1.9 / §12.
- NVFP4 / Blackwell-Tegra-specific quantization dtypes.
- macOS / Apple Silicon as a deploy target — the
_probe_apple_siliconprobe atgpu.py:355stays as a development-host affordance only.
What "supported" means in this ADR
A supported target is one where, for every milestone delivered on
main:
- The
python/workspace imports cleanly (uv sync). openral doctor,openral detect, andopenral sim run's pure-CPU path exit 0.- Where a GPU is present,
openral deployagainst a representativeRobotEnvironment(e.g.deployments/so100_hello_gstreamer_*) completes withbudget_violations == 0for ≥30 ticks. - There is a CI signal — hosted runner for x86 +
ubuntu-24.04-arm, self-hosted[self-hosted, l4t]for L4T — that catches regression per category (import-only on hosted aarch64; GPU end-to-end on self-hosted).
"Best-effort" means the import path is unbroken and the heuristics are defensible, but no CI signal is asserted. Maxwell Nano falls under best-effort (see Open questions).
Decision
1. Canonical platform matrix
The three named user targets collapse to two Dockerfile families:
| Target name (user-facing) | Arch | Silicon | OS / SDK | CUDA CC | Image |
|---|---|---|---|---|---|
x86-cuda |
x86_64 | Ampere+ dGPU | Ubuntu 24.04 + CUDA 12.x | 7.5 – 10.0 | x86 (default) |
x86-cpu |
x86_64 | none | Ubuntu 24.04 | n/a | x86 (WITH_CUDA=0) |
l4t-orin |
aarch64 | Tegra Ampere (CC 8.7) | JetPack r36 (L4T 36.4) | 8.7 | l4t |
l4t-xavier |
aarch64 | Tegra Volta (CC 7.2) | JetPack r35–r36 (degraded) | 7.2 | l4t (shared) |
l4t-nano-maxwell |
aarch64 | Tegra Maxwell | Legacy JetPack 4.x | 5.3 | l4t (best-effort) |
Image consolidation rules:
- Orin and Xavier share
l4t. JetPack r36.4 supports AGX Orin / Orin NX / Orin Nano natively, and Xavier-class boards in degraded-compute mode (CC 7.2 → FP16/INT8 only). The image is one Dockerfile; runtime behaviour differs because_probe_jetsonandauto_select_quantread the actual compute capability. - x86 stays one Dockerfile with
WITH_CUDAbuild arg controlling theFROMbase (nvidia/cuda:12.6.2-cudnn-runtime-ubuntu24.04vsubuntu:24.04) and the GStreamer-nvcodec apt set. Avoids a third top-level Dockerfile. - No Holoscan branch. The ADR-0010 Amendment 2026-05-12 evaluated Holoscan against lean GStreamer + custom NvBufSurface glue and deferred Holoscan. This ADR endorses that deferral.
- Maxwell Nano (CC 5.3) is best-effort.
JETSON_BOARD_TOPSlists it at 0.5 TOPS, but the NVMM zero-copy path throughlibnvbufsurface.sois not guaranteed on stripped L4T 32.x images and recent JetPack drops dropped its support. Thel4timage will install;openral doctorwill emit aninfowarning that the NVMM path is unavailable; noopenral deployGPU end-to-end guarantee is made. See Open questions.
The matrix is endorsed-final. Two Dockerfiles (x86, l4t); two
runtime Platform values for the NVIDIA branches
(NVIDIA_DESKTOP, TEGRA) plus CPU_ONLY; one canonical mapping
from (arch, compute_capability) → image family.
2. Platform enum is final for this scope
The current three-way Platform enum (pipeline.py:60) is
sufficient and final for the in-scope targets:
- Orin AGX / Orin NX / Orin Nano →
TEGRA(via/etc/nv_tegra_release). - Xavier / Xavier NX →
TEGRA(same). - Maxwell Nano →
TEGRA(same, best-effort). - x86 + NVIDIA dGPU →
NVIDIA_DESKTOP(vianvh264decprobe). - x86 CPU-only / WSL2 without GPU passthrough / Apple Silicon dev
host →
CPU_ONLY.
A future contributor must not split TEGRA into ORIN /
XAVIER / NANO: those distinctions are SoC-level and belong in
JetsonInfo.cuda_compute_capability (already populated by
_probe_jetson). The Platform enum names the GStreamer
element-family selector, not the SoC.
3. Detection gaps to close (PR 2/3 scope)
PR 2/3 of issue #89 closes the following gaps in
python/detect/src/openral_detect/probes/gpu.py:
- Replace the
"Orin" in boardheuristic atgpu.py:307and:340with explicit board-name branches: "Orin"→(8, 7)."Xavier"→(7, 2)."Nano"& not"Orin"(Maxwell) →(5, 3).- unknown →
None+ structured warning (don't guess(7, 2)). - Add
RobotCapabilities.nvmm_available: boolpopulated by probing the same paths thatpython/runner/src/openral_runner/backends/gstreamer/nvbufsurface.pyalready walks (/usr/lib/aarch64-linux-gnu/tegra/libnvbufsurface.so). The field surfaces what the runtime needs to decide whether to take the zero-copy path. The on-diskschema_versionstays at"0.1"while we are pre-publish, so no migrator is required for this additive field (CLAUDE.md §1.6). - Recorded fixtures under
tests/unit/fixtures/jetson/for_probe_jetson— one per supported board (Orin AGX / Orin NX / Orin Nano / Xavier NX). Real device output, no mocks (CLAUDE.md §1.11). - Pin-tests for
auto_select_quantatquantization.py:72asserting the expected dtype on a recordedDeviceInfofor each target board: - x86 CPU-only →
int8_dynamic. - Orin AGX (32 / 64 GB shared) →
bf16. - Orin Nano / Xavier NX (8 GB shared) →
int4. - Xavier (CC 7.2, > 8 GB) →
fp16. - x86 dGPU > 8 GB, CC ≥ 8.0 →
bf16; CC < 8.0 →fp16.
DTYPES_BY_COMPUTE_CAPABILITY at gpu.py:79 already advertises the
correct dtype set for CC 7.2 and CC 8.0 / 8.7; no code change
required there.
4. Optional-dependency story (PR 3/3 scope, partial)
The root pyproject.toml currently exposes a gstreamer extra. PR 3/3 makes the extras matrix canonical:
| Extra | Installs | Used by |
|---|---|---|
gstreamer |
PyGObject, GStreamer bindings |
x86, l4t |
gstreamer-nvmm |
pycuda, ctypes glue |
l4t |
Every optional import in the codebase must except ImportError with a
typed ROSConfigError pointing at the right uv sync --extra command
— the existing pattern in python/sim/src/openral_sim/adapters/.
5. CI matrix (PR 3/3 scope)
.github/workflows/test-python.yml today runs ubuntu-22.04 /
ubuntu-24.04 / macos-14 (all x86) under Python 3.12. There is no
aarch64 or Jetson signal. PR 3/3 adds:
- Hosted x86 Ubuntu unchanged — unit + ruff + mypy + schema export.
ubuntu-24.04-arm(GitHub's free aarch64 hosted runner, GA 2025). Pure-Python lint / mypy / schema parity on arm. No GPU; catches aarch64-only import bugs (e.g. thetegralibrary paths innvbufsurface.py).- Self-hosted
[self-hosted, l4t]runner pool. The HIL convention documented intests/README.mdalready uses[self-hosted, lab-<robot>]; the same pattern extends here. A singlel4tlabel covers Orin vs Xavier — runtime behaviour differs but CI scheduling does not. - Docker image build matrix (
docker-build.yml, new). On every push tomain, build and pushx86(--platform linux/amd64) andl4t(--platform linux/arm64) to GHCR. Buildx + QEMU is for image assembly only; the smoke tests (docker-smoke-*) must run on native hardware. CUDA and NVMM do not work under QEMU.
6. CLI exposure
openral doctor (in python/cli/src/openral_cli/main.py) gains a
Platform row that prints the consolidated image family the host
matches — one of x86-cuda / x86-cpu / l4t-orin / l4t-xavier /
l4t-nano-maxwell / unsupported — and a one-line reason. The
underlying data already comes through _check_gpu; the new row is a
one-line summary derived from it. No new public symbol on the
detection side beyond RobotCapabilities.nvmm_available (Decision 3).
Consequences
Good
- A future contributor opening
openral deployon an Orin Nano can read one table and know which image to pull, which dtype the runtime will pick, and what CI will catch a regression in. - The
Platformenum + the detection probes are pinned to a single contract; no more drive-by additions of "is this a Tegra?" probes in random files. - The "no mocks" stance (CLAUDE.md §1.11) is preserved: every
platform-conditional code path will have a recorded real-device
fixture under
tests/unit/fixtures/jetson/or apytest.skipwith a typed reason — never aMagicMock. - Licensing (ADR-0012) is unaffected: all
the modules touched by PR 2/3 and PR 3/3 (
python/detect/,python/rskill/,python/runner/,python/cli/,python/core/) are Apache-2.0, like the rest of the repo.
Bad / costly
- Maintaining a self-hosted
l4trunner pool is operational cost; the team has to commit to keeping a real Orin board (or two) in CI rotation. PR 3/3 documents the runner-onboarding playbook. - The
RobotCapabilitiesadditive field (Decision 3) lands on the pre-publish baseline (schema_version: "0.1") — every committedrobots/*/robot.yamlmust be re-validated, but no migrator is required while we are pre-publish (CLAUDE.md §1.6). ubuntu-24.04-armhosted runners are still GitHub-Actions-beta-class; flakes there may slow merges. PR 3/3 lands the runner withcontinue-on-error: truefor the first two weeks, then promotes it to required.
Things this ADR explicitly does NOT decide
- Whether to support Thor / Spark beyond the reserved enum value.
- Whether to ever ship NVFP4 quantization.
DTYPES_BY_COMPUTE_CAPABILITYalready listsFP4_NVFP4at CC 10.0 but the loader path for it is out of scope. - Whether to add a Mac-arm64 deploy image. Apple Silicon stays a
development affordance via the existing
_probe_apple_silicon. - Whether
gstreamer-nvmmshould be an extra ofopenral_runneror a top-level workspace extra. PR 3/3 will resolve this in thepyproject.tomlchange.
Open questions
-
Is Maxwell Nano (CC 5.3) officially supported, or best-effort?
JETSON_BOARD_TOPSlists it; the plan treats it as best-effort becauselibnvbufsurface.sopaths are not guaranteed on the L4T 32.x family. This ADR provisionally calls it best-effort. PR 2/3 may promote it to supported if a recorded fixture and a passingopenral deployend-to-end can be demonstrated on a real Maxwell Nano. -
What is the contract for an Xavier in JetPack r35 vs r36? JetPack r36.4 explicitly supports Xavier in degraded-compute mode; r35 supported it fully. The image pins r36 — older r35 stacks are not tested. PR 2/3 should add a
openral doctorwarning when the detectedjetpack_versionis older than r36. -
Should
Platform.detect_platform()accept an injectable path for/etc/nv_tegra_release? Today it reads a module-levelFinal[Path]constant atpipeline.py:52. Unit testing it on a non-Tegra dev host requires either monkey- patching the constant or running inside a container that doesn't exist on PR-build runners. PR 2/3 should refactor the constant into an optional parameter so the unit test does not need to touch the filesystem.
Verification
This ADR is research-only. The reviewer checks:
- §1 image table covers all three user-named targets — x86 (CUDA or CPU-only), Orin (AGX / NX / Nano), generic Jetson (Xavier / Xavier NX / Maxwell Nano). 3 / 3 covered, no unmapped device.
- Every gap in §3 references a concrete file with line range, so PR 2/3 / PR 3/3 implementers can start without re-exploring.
- No mocks, no smoke tests in the proposed test plan (CLAUDE.md §1.11 / §5.4) — fixtures are recorded real-device output, sim tier uses real LIBERO / MetaWorld, HIL tier uses real boards.
- Layer discipline. No new layer is introduced; the gaps are all inside layers that already exist (Detection in Layer 0/1, Skill quantization in Layer 3, Runner in the cross-cutting Inference Runner spanning HAL / Sensors / Skill). No CLAUDE.md §6.1 layer boundary is crossed.
End-to-end smoke on each image family is the exit criterion of PR 3/3, not of this ADR:
# x86-cuda (RTX 3090+)
docker run --rm --gpus all \
-v "$(pwd)/examples:/workspace/examples:ro" \
openral:x86-latest \
--config deployments/so100_hello_gstreamer_videotestsrc.yaml \
--max-ticks 30
# expect: exit 0, budget_violations=0, Platform=NVIDIA_DESKTOP in trace
# x86-cpu (no GPU, WITH_CUDA=0 build)
docker run --rm \
-v "$(pwd)/examples:/workspace/examples:ro" \
openral:x86-cpu-latest \
--config deployments/so100_hello_gstreamer_videotestsrc.yaml \
--max-ticks 30
# expect: exit 0, budget_violations=0, Platform=CPU_ONLY, no NVMM frames
# l4t on a real Orin
docker run --rm --runtime nvidia \
-v "$(pwd)/examples:/workspace/examples:ro" \
openral:l4t-latest \
--config deployments/so100_hello_gstreamer_videotestsrc.yaml \
--max-ticks 30
# expect: exit 0, budget_violations=0, Platform=TEGRA,
# libnvbufsurface.so loaded, CUDA context acquired
Alternatives considered (and rejected)
- Three Dockerfiles (x86-cuda, x86-cpu, l4t). Rejected: the
CUDA/CPU split is purely an apt-package and
FROMline difference; a build arg keeps maintenance to one file. - Split
Platform.TEGRAintoORIN/XAVIER/NANO. Rejected: the enum names a GStreamer element-family selector. SoC distinctions belong inJetsonInfo.cuda_compute_capability, which is already populated. - Add a
Platform.HOLOSCANvalue now. Rejected: ADR-0010 Amendment 2026-05-12 already reservesSensorReaderBackend.HOLOSCAN. Adding aPlatform.HOLOSCANvalue would conflate the backend selection (Holoscan vs GStreamer) with the element-family selection (which CUDA stack to drive). - Treat Apple Silicon as a deploy target. Rejected: no
mpsinference adapter; the_probe_apple_siliconpath exists only soopenral detectandopenral doctorgive useful output on a dev laptop. - Pin a single CUDA minor version (12.6) across both images.
Considered. The
l4timage is constrained by what JetPack ships; matching x86 to that constraint would lock x86 to an older CUDA than the workspace'storch 2.10ABI requires. The two images ship different CUDA minors (12.x on x86, JetPack-bundled on l4t) and the runtime detection code does not care.
Amendments
2026-05-18 — Status flipped Proposed → Accepted
The single-Dockerfile + PLATFORM build-arg pattern declared in the
Decision section landed via PR #112 (merge commit 93d28af) — the
final piece of the consolidation work the 2026-05-14 single-Dockerfile
amendment described. The x86-cuda / x86-cpu / l4t flavors all build
off one Dockerfile driven by --build-arg PLATFORM=...; CI matrices
in .github/workflows/ exercise each. The Platform enum, runtime
detection helpers, and JetsonInfo schema referenced in the Decision
are live in openral_detect / openral_core.
No behavioural change against the Decision text — only the status field flips.