rSkills Reference
rSkills are HuggingFace-Hub-shaped packages — manifest + weights + reproducible eval/ — loaded via rSkill.from_pretrained(...) and gated by license, embodiment tags, and capabilities. See CLAUDE.md §3 for packaging details.
Install & manage
openral rskill search aloha # discover skills on the OpenRAL Hub org
openral rskill search --kind detector # …filter by kind/role/embodiment/license
openral rskill install OpenRAL/rskill-smolvla-libero
openral rskill list # list installed rSkills
openral rskill check # which installed rSkills run on this host?
rskill install expects an org/name Hub id. A bare name (e.g. rskill-smolvla-libero)
fails fast with the canonical OpenRAL/… suggestion rather than a raw Hub 404 — use
rskill search if you don't know the id. rskill search queries the OpenRAL HF Hub org
(HfApi.list_models), validates each candidate's rskill.yaml, and prints a paste-able
repo_id table (ADR-0055 D4).
Discovery views (SKILL.md)
Each rskills/<id>/ also carries a generated SKILL.md — a standard agent-skill
view (YAML name + description frontmatter, rSkill fields under metadata:) so
tools that read the agent-skill format can discover OpenRAL rSkills. It is
discovery-only: an agent can use it to select a skill, but execution always
goes through rSkill.from_pretrained + the robot HAL — markdown cannot hold
weights or the typed capability/license gates. rskill.yaml remains the single
source of truth (CLAUDE.md §1.3); regenerate (never hand-edit) with:
python tools/generate_rskill_skillmd.py # all rskills/<id>/
python tools/generate_rskill_skillmd.py --check # CI: fail if any are stale
The same SKILL.md is mirrored to each OpenRAL/rskill-* Hub repo.
VLA policy rSkills
All entries are published under OpenRAL/rskill-* on HuggingFace Hub and exercised end-to-end by a config in scenes/.
| rSkill | Backbone / family | Targets | License |
|---|---|---|---|
smolvla-libero |
SmolVLA fine-tuned | franka_panda |
Apache-2.0 |
smolvla-metaworld |
SmolVLA fine-tuned | sawyer |
Apache-2.0 |
smolvla-maniskill-franka |
SmolVLA × ManiSkill3 PickCube-v1 |
franka_panda |
Apache-2.0 |
xvla-libero |
xVLA (Florence-2) | franka_panda |
Apache-2.0 |
act-libero |
ACT | franka_panda |
Apache-2.0 |
molmoact2-libero-nf4 |
MolmoAct2 NF4 (Molmo2-ER VLM + flow-matching, ~5.5 B) | franka_panda |
Apache-2.0 |
pi05-libero-nf4 |
π0.5 NF4 | franka_panda |
Permissive research (weights non-Apache) |
pi05-openarm-vision-nf4 |
π0.5 NF4 (bimanual, absolute joint targets) | openarm |
Apache-2.0 |
pi05-robocasa365-human300-nf4 |
π0.5 NF4 (RoboCasa365 + 300 human eps fine-tune) | panda_mobile |
Apache-2.0 (weights: research) |
pi05-so101-pickplace-nf4 |
π0.5 NF4 (SO-101 pick-place fine-tune, 3 RGB streams) | so101_follower |
Apache-2.0 |
act-aloha |
ACT (Action Chunking Transformer) | aloha_bimanual |
MIT |
act-aloha-insertion |
ACT insertion checkpoint — custom example | aloha_bimanual |
MIT |
diffusion-pusht |
Diffusion Policy | pusht_2d |
Apache-2.0 |
rldx1-ft-libero-nf4 |
RLWRLD RLDX-1 (Qwen3-VL-8B + MSAT, ~6.9 B) | franka_panda |
RLWRLD non-commercial — sidecar runtime |
rldx1-ft-gr1-nf4 |
RLDX-1 (GR1 bimanual) | gr1 |
RLWRLD non-commercial — sidecar runtime |
rldx1-ft-rc365-nf4 |
RLDX-1 (RoboCasa365 fine-tune) | panda_mobile |
RLWRLD non-commercial — sidecar runtime |
rldx1-ft-simpler-widowx-nf4 |
RLDX-1 (SimplerEnv WidowX fine-tune) | widowx |
RLWRLD non-commercial — sidecar runtime |
rldx1-pt-nf4 |
RLDX-1 (foundation pretrain) | franka_panda |
RLWRLD non-commercial — sidecar runtime |
gr00t-n17-libero |
NVIDIA Isaac GR00T N1.7 (3B, Cosmos-Reason2-2B VLM backbone) | franka_panda |
NVIDIA Open Model License (commercial OK) — out-of-process sidecar (ADR-0046) |
Perception rSkills (kind: detector)
Object-detection rSkills emit ObjectsMetadata (2-D detections lifted to 3-D in the deploy graph) instead of an Action. See ADR-0035 and ADR-0037.
| rSkill | Backbone | Notes |
|---|---|---|
rtdetr-coco-r18 |
RT-DETR R18 (COCO) | lightweight ONNX export |
rtdetr-v2-r50vd |
RT-DETR v2 R50vd | higher-accuracy variant |
locateanything-3b-nf4 |
NVIDIA LocateAnything-3B NF4 | open-vocabulary grounding; runs via the VLM_SIDECAR detector tier (out-of-process sidecar); dynamic reasoner-driven query via the read-only locate_in_view tool (ADR-0043) |
omdet-turbo-indoor |
OmDet-Turbo Swin-tiny (omlab/omdet-turbo-swin-tiny-hf) |
Apache-2.0 open-vocabulary detector run in-process over a fixed ~266-class curated indoor vocabulary; engine: zeroshot_hf → DetectorTier.ZEROSHOT_HF; mode: continuous background producer, far more than the 80 COCO classes (ADR-0037 2026-06-12 amendment) |
omdet-turbo-locator |
OmDet-Turbo Swin-tiny (omlab/omdet-turbo-swin-tiny-hf) |
Apache-2.0 on-demand sibling — same weights/engine, mode: on_demand; the reasoner prompts it via locate_in_view. Lightweight, real-time, in-process alternative to the 3B LocateAnything VLM (ADR-0051) |
The RT-DETR rSkills are Apache-2.0 and runnable on any camera-equipped embodiment. They are consumed by openral_perception_ros (RosImageObjectDetectorNode) in the openral deploy sim / deploy run graph. LocateAnything is NVIDIA non-commercial and ships as an NF4 PyTorch/custom-code artifact. Because its custom code needs transformers==4.57.1 (incompatible with the runtime's transformers>=5), it runs out-of-process in an isolated venv (tools/locateanything_sidecar.py) and is driven by the LocateAnythingDetector backend over ZMQ — the DetectorTier.VLM_SIDECAR path selected for runtime: pytorch detector manifests (ADR-0037 2026-06-09 amendment). The detector-node side of that ZMQ link needs the pyzmq + msgpack client, shipped in the locateanything dependency group (uv sync --group locateanything); without it the deploy sim --object-detector-manifest leg fails per-request with No module named 'zmq'. The backend parses its <ref>/<box> text into ObjectsMetadata and exposes set_query() for the open-vocabulary query (static default = manifest labels; dynamic override via the /openral/perception/detector_query topic for the continuous leg, and the read-only locate_in_view reasoner tool + service for a one-shot on-demand check, ADR-0043).
omdet-turbo-indoor is the commercially-permissive open-vocabulary alternative: OmDet-Turbo is a first-class transformers architecture (AutoModelForZeroShotObjectDetection), so it loads under the runtime's own transformers>=5 and runs in-process (no sidecar, no ZMQ) via the OmDetTurboDetector backend — the DetectorTier.ZEROSHOT_HF path selected when detector.engine is zeroshot_hf (ADR-0037 2026-06-12 amendment). Unlike LocateAnything it is not query-driven: it has no set_query and the detector node does not subscribe it to /openral/perception/detector_query. Its fixed ~266-class indoor vocabulary (manifest labels) is evaluated on every frame, so it acts as an unprompted background producer that populates the world object list with far more than the 80 COCO classes. torch + transformers ship in the omdet dependency group (uv sync --group omdet); without them the in-process backend fails on first detect().
Invocation mode: continuous vs on-demand (ADR-0051)
Detectors carry a detector.mode (DetectorMode) that is orthogonal to detector.engine — where engine says how the model runs, mode says when the reasoner invokes it:
continuous(default) — an always-on background producer (rtdetr-coco-r18,rtdetr-v2-r50vd,omdet-turbo-indoor). Runs on the camera tee every frame, streamsObjectsMetadataintoWorldState.detected_objects; the reasoner reads it passively (world state /recall_object) and never prompts it. It is not an ExecuteSkill tool.build_tool_palettecollects these intoToolPalette.continuous_detectorsso the LLM is told what is already tracked for free.on_demand— a prompted open-vocab locator (locateanything-3b-nf4,omdet-turbo-locator). The reasoner invokes it via the read-onlylocate_in_viewtool (ADR-0043) only when it needs a specific object right now that the continuous bank doesn't cover.omdet-turbo-locatorwraps the same Apache-2.0 OmDet-Turbo weights asomdet-turbo-indoorbut in on-demand mode — a lightweight (~115M, real-time, in-process) alternative to the 3B LocateAnything VLM for simple "find X" queries; LocateAnything stays the higher-quality option for complex referring expressions. TheOmDetTurboDetectorbackend exposesset_query/detect_with_query, which the detector node binds byhasattr, so the same backend serves either mode (packaging two single-purpose rSkills, not one dual-mode rSkill, is what keeps the modes from straddling).
This cleanly separates open-vocabulary from prompting: the locate_in_view tool description is made coverage-aware (it lists the continuous detectors' class counts + keywords), so the LLM's rule is mechanical — object within continuous coverage → read world state; object outside it (novel / attribute-qualified) → locate_in_view.
Scene-VLM rSkills (kind: vlm)
kind: vlm rSkills (ADR-0047) are vision/video-language models used as S2 scene-understanding components: they answer open-ended natural-language questions about the current camera view (task-progress / success verification — "has the robot grasped the mug?", "did we drop the object?", "is the task complete?") and emit text, never actions. They are not localizers — use a kind: detector rSkill (locate_in_view) to find where an object is. A scene VLM is reached through the read-only query_scene reasoner tool, never ExecuteSkill (so role: s2, excluded from the actuation palette by design).
| rSkill | Backbone | Notes |
|---|---|---|
qwen35-4b-nf4 |
Qwen3.5-4B NF4 (natively-multimodal, hybrid linear attention) | Apache-2.0; pre-quantized NF4 checkpoint (~3.3 GB, fits 8 GB); runs out-of-process via tools/qwen_vlm_sidecar.py + the QwenSceneVlm backend over ZMQ; served by openral_perception_ros.scene_vlm_node on /openral/perception/query_scene; drives the reasoner's read-only query_scene tool (ADR-0047) |
Like the LocateAnything detector, the Qwen scene VLM runs in an isolated sidecar venv (its bitsandbytes / qwen-vl-utils / Gated-DeltaNet stack would perturb the lerobot-pinned transformers==5.3.0 runtime, and a 4B model + CUDA context should not share the rclpy process). The node-side ZMQ + msgpack client ships in the qwen-vlm dependency group (uv sync --group qwen-vlm). The rSkill's weights_uri is a pre-quantized NF4 checkpoint (transformers-native save_pretrained layout with an embedded quantization_config) built by tools/build_qwen_vlm_nf4_checkpoint.py; it loads directly as 4-bit with no bf16 load spike. source_repo records the SHA-pinned upstream Apache-2.0 Qwen model (provenance). The reasoner offers query_scene when launched with scene_query_available:=true.
Reward-monitor rSkills (kind: reward)
kind: reward rSkills (ADR-0057) are robotic reward / progress-monitor models that run in parallel with a VLA policy and score the rollout: given the VLA's camera frames + the task instruction, they emit per-frame normalized progress (0–1) and per-frame success probability. Where a scene VLM (query_scene) returns free text, a reward monitor returns quantitative scalars + trends. It is reached through the read-only query_task_progress reasoner tool, never ExecuteSkill (so role: s2, excluded from the actuation palette); the signal is advisory — it feeds the replanning ladder, never the motors.
| rSkill | Backbone | Notes |
|---|---|---|
robometer-4b |
Robometer-4B (Qwen3-VL-4B reward foundation model, arXiv 2603.02115) | Apache-2.0; pre-quantized NF4 at hf://OpenRAL/rskill-robometer-4b-nf4, meta-loaded directly as 4-bit (~3.3 GB resident, fits 8 GB alongside a small VLA); runs out-of-process via tools/robometer_sidecar.py + the RobometerReward backend over ZMQ; served by openral_perception_ros.reward_monitor_node on /openral/perception/query_task_progress; drives the reasoner's read-only query_task_progress tool (ADR-0057) |
Like the Qwen scene VLM, the Robometer monitor runs in an isolated sidecar venv: its RBM class cannot be loaded by vanilla transformers.AutoModel (its HF config.json advertises architectures: ["RFM"] with no auto_map), so the sidecar uv pip installs the pinned upstream robometer package and forces transformers==4.57.1 (the resolver pulls 5.x, which drops input_ids from the processor). The node-side ZMQ + msgpack client ships in the robometer dependency group (uv sync --group robometer). The sidecar is a stateless scorer; the rolling frame buffer (RollingFrameBuffer, fed by the same sensor_msgs/Image topic the VLA uses — GStreamer tee on real hardware, sim HAL publisher in deploy-sim) lives node-side. weights_uri accepts the SHA-pinned Apache-2.0 upstream (NF4-quantized on load), a published pre-quantized OpenRAL repo, or local:///abs/dir. The pre-quantized path (built by tools/build_robometer_nf4_checkpoint.py) loads the packed NF4 weights DIRECTLY on the meta device — no bf16 materialization, no requantize (~25 s process→ready vs ~110 s + a 19 GB CPU spike), bit-identical to the bf16+quantize path with determinism pinned (math SDP + use_deterministic_algorithms + CUBLAS_WORKSPACE_CONFIG). The forward's activation memory scales with frame count × resolution, so the client evenly subsamples the window to max_frames (8) to stay co-resident with the sim (and a small NF4 VLA) on 8 GB. In deploy-sim, openral deploy sim --enable-reward-monitor brings the monitor up parallel to the VLA and sets task_progress_available:=true so the reasoner is offered query_task_progress (validated live on openarm). The upstream robometer code is not an OpenRAL-trusted org — it is pinned by commit and runs only in the isolated venv.
Manifest format
# rskills/smolvla-libero/rskill.yaml (excerpt)
name: "OpenRAL/rskill-smolvla-libero"
version: "0.1.0"
license: "apache-2.0"
role: "s1"
embodiment_tags: ["franka_panda"]
sensors_required:
- modality: "rgb"
vla_feature_key: "observation.images.camera1"
# … second RGB stream + proprioception
License notes
See CLAUDE.md §3 for the full VLA license matrix. Key restrictions:
- GR00T-based checkpoints — license is version-specific (ADR-0046): N1/N1.5/N1.6 are NVIDIA OneWay Noncommercial (nvidia_non_commercial; requires OPENRAL_ALLOW_NONCOMMERCIAL=1); N1.7+ are NVIDIA Open Model License (nvidia_open_model, commercial OK).
- LocateAnything-3B — NVIDIA License, non-commercial research/evaluation; private HF rSkill, requires OPENRAL_ALLOW_NONCOMMERCIAL=1 and remote-code acceptance.
- π0 / π0.5 weights — permissive research (not full Apache-2.0 for commercial deployment).
- RLDX-1 — RLWRLD non-commercial; runs as an out-of-process sidecar.
Provenance signing via sigstore is planned but not yet implemented — the loader emits an rskill.unverified_provenance warning on every load. Set OPENRAL_REQUIRE_SIGNED_SKILLS=1 to fail closed.