VLA × Robot × Simulation Compatibility Matrix
This document is the canonical reference for which Vision-Language-Action models run on which robots under which simulators in the OpenRAL ecosystem. It is derived from upstream model cards, published papers, and checkpoint inspection. Entries marked TBD have not been locally verified; contributions welcome via PRs that include checkpoint inspection evidence.
See also: CLAUDE.md §7.4 for the normative license matrix and CLAUDE.md §6.4 for the rSkill packaging format.
1. Robots (Currently Integrated)
| Robot | Embodiment tags | DoF | Control mode | HAL module | Sim env |
|---|---|---|---|---|---|
| SO-100 (LeRobot) | so100_follower |
6 arm + 1 gripper | joint_position |
openral_hal.so100_follower |
SO-100 digital twin (MuJoCo, in-process) |
| Franka Panda (LIBERO sim only) | libero, franka_panda |
7 + gripper | cartesian_delta (6-D EEF + axis-angle) |
LiberoEnv (lerobot) | LIBERO (MuJoCo via robosuite) |
Hardware-in-loop tested:
- SO-100: tests/hil/ gate label [self-hosted, lab-so100]. USB tether required.
- Franka Panda: simulation only at this time; real-hardware HAL is planned (packages/openral_hal_franka/).
2. Embodiment Tag Registry
Embodiment tags are short strings that appear in rskill.yaml under embodiment_tags and in RobotCapabilities.embodiment_tags. The skill loader refuses to activate a skill whose tags do not intersect the target robot's capability set.
| Tag | Robot / Platform | DoF | Source dataset / paper | Notes |
|---|---|---|---|---|
so100_follower |
LeRobot SO-100 arm | 6 | lerobot/so100 | Follower arm in leader-follower teleoperation setup |
so101_follower |
LeRobot SO-101 arm | 6 | lerobot/so101 | Updated hardware revision of SO-100 |
libero |
Franka Panda on LIBERO benchmark | 7 + gripper | LIBERO (Yuke Zhu et al., NeurIPS 2023) | Simulation-only tag for LIBERO benchmark training |
franka_panda |
Franka Panda (real + sim) | 7 + gripper | Standard industry robot; widespread in BridgeData / Open X | Broader tag; use libero when targeting LIBERO-specific checkpoints |
widowx |
WidowX 250s | 6 | BridgeData V2 | Low-cost research arm; common in Open X-Embodiment |
gr1 |
Unitree GR1 humanoid | 23 | NVIDIA Arena dataset | Full humanoid; requires S0 cerebellar layer |
aloha |
Aloha bimanual teleoperation setup | 2 × 7 | ACT paper (Stanford / Toyota) | Bimanual; two Viperx arms with overhead + wrist cameras |
koch |
Koch arm | 6 | lerobot/koch | Low-cost leader-follower arm |
piper |
Agilex Piper arm | 6 | ISdept dataset | Mid-range research arm from Agilex |
3. VLA Compatibility Matrix
Columns: - VLA (HF ID) — canonical Hugging Face model ID - Sim env — benchmark / simulator - Robot tag — required embodiment tag(s) - State dim — observation state vector - Cameras — image inputs (resolution + any pre-processing) - Norm stats in checkpoint — whether normalisation statistics are bundled - rSkill — local skill stub path (if exists) - License — SPDX expression for the weights (code license may differ) - Notes
3.1 LIBERO (Franka Panda, MuJoCo via robosuite)
The OpenRAL embodiment for LIBERO is
franka_panda— seerobots/franka_panda/. The sim-imposed observation/action contract (8-D EEF state, 7-D delta-EEF action, 180° image flip) lives in the LIBERO scene adapter (ADR-0007).
| VLA (HF ID) | Sim env | Robot tag | State dim | Cameras | Norm stats in ckpt | rSkill | License | Notes |
|---|---|---|---|---|---|---|---|---|
lerobot/smolvla_libero |
LIBERO | libero |
8-D eef_pos(3)+axisangle(3)+gripper_qpos(2) ✓ |
image→camera1 + image2→camera2 (256×256, flip 180°) ✓ |
Yes — step_5_normalizer_processor.safetensors (state=[8], action=[7]) ✓ |
rskills/smolvla-libero/ |
Apache-2.0 | Paper: Spatial 90% / Object 96% / Goal 92% / Long 71% (avg 87.3%). scenes/benchmark/libero_spatial.yaml (with --rskill rskills/smolvla-libero) |
HuggingFaceVLA/smolvla_libero |
LIBERO | libero |
8-D (same as above) | same as above | Yes (assumed same as above) | — | Apache-2.0 | Community mirror. Not locally verified. |
lerobot/pi05_libero_finetuned_v044 |
LIBERO | libero, franka_panda |
8-D same as smolvla ✓ | image+image2 (256×256, flip 180°) + empty_camera_0 (224×224 zeros) ✓ |
Yes — step_2_normalizer_processor.safetensors (state=[8], action=[7]) ✓ |
rskills/pi05-libero-nf4/ |
Permissive research (weights) / Apache-2.0 (code) | π0.5 (PaliGemma 3B backbone); requires ≥8 GB VRAM. scenes/benchmark/libero_spatial.yaml (with --rskill rskills/pi05-libero-nf4). Non-commercial weights — see §5 |
lerobot/pi0_libero_finetuned_v044 |
LIBERO | libero, franka_panda |
8-D (same format as pi05 — unverified) | same 3-camera format as pi05 (unverified) | Yes (assumed same format) | — | Permissive research (weights) / Apache-2.0 (code) | π0 (same license caveat). Not locally verified. |
lerobot/xvla-libero |
LIBERO | libero, franka_panda |
8-D same eef_pos+axisangle+gripper_qpos; padded to max_state_dim=20 internally ✓ |
image+image2 (224×224, flip 180°) + empty_camera_0 (224×224 zeros) ✓ |
IDENTITY norm (no stats file) ✓; action output [20] (first 7 elements = LIBERO 7-D) ✓ | rskills/xvla-libero/ |
Apache-2.0 | xVLA (Florence-2 backbone, flow-matching). scenes/benchmark/libero_spatial.yaml (with --rskill rskills/xvla-libero) |
ar0s/groot_libero |
LIBERO | libero, franka_panda |
TBD | TBD | TBD | — | Apache-2.0 (fine-tune) | GR00T on LIBERO; base model is NVIDIA AI Foundation non-commercial — guard required |
3.2 MetaWorld (Sawyer, MuJoCo)
The OpenRAL embodiment for MetaWorld is
sawyer— seerobots/sawyer/. The MetaWorld benchmark simulates a Rethink Sawyer; some upstream checkpoints carry afranka_pandatag, but the actual robot is Sawyer (ADR-0007).
| VLA (HF ID) | Sim env | Robot tag | State dim | Cameras | Norm stats in ckpt | rSkill | License | Notes |
|---|---|---|---|---|---|---|---|---|
lerobot/smolvla_metaworld |
MetaWorld MT50 | franka_panda, manipulator |
4-D agent_pos (XYZ + gripper) ✓ |
observation.image→camera1 (256×256, flip+resize from 480×480) ✓ |
Yes — step_5_normalizer_processor.safetensors (state=[4], action=[4]) ✓ |
rskills/smolvla-metaworld/ |
Apache-2.0 | Action: 4-D delta (XYZ + gripper). Sawyer robot in MetaWorld (not Franka despite tag). scenes/benchmark/metaworld_push.yaml (with --rskill rskills/smolvla-metaworld) |
3.3 RoboCasa (Franka Panda, MuJoCo)
| VLA (HF ID) | Sim env | Robot tag | State dim | Cameras | Norm stats in ckpt | rSkill | License | Notes |
|---|---|---|---|---|---|---|---|---|
lerobot/smolvla_robocasa |
RoboCasa | franka_panda, manipulator |
TBD | TBD | TBD | — | Apache-2.0 | Kitchen manipulation; no rSkill stub yet |
3.4 SO-100 / SO-101 (real robot or sim)
| VLA (HF ID) | Sim env | Robot tag | State dim | Cameras | Norm stats in ckpt | rSkill | License | Notes |
|---|---|---|---|---|---|---|---|---|
chamborgir/smolvla_pickplace_20k |
SO-101 real | so101_follower |
TBD | TBD | TBD | — | Apache-2.0 | 20k steps pick-and-place fine-tune |
TakuyaHiraoka/act_so101_pick_diverse_objects |
SO-101 real | so101_follower |
TBD | TBD | TBD | — | Apache-2.0 | ACT policy; diverse object pick task |
edge-inference/smolvla-so101-pick-orange |
Isaac Sim | so101_follower |
TBD | TBD | TBD | — | Apache-2.0 | Isaac Sim backend; requires Isaac Sim license for reproduction |
HollyTan/pi05_so101_pick_place-v2.2basev2.4_abs_nofreeze_8b |
so101_box (MuJoCo) |
so101_follower |
6-D joint positions ✓ | top+wrist+front (224×224); scene oak_top→top, wrist→wrist, front zero-padded via image mask ✓ |
Yes — policy_{pre,post}processor sidecars (state=[6], action=[6]) ✓ |
rskills/pi05-so101-pickplace-nf4/ (nf4 mirror at OpenRAL/rskill-pi05-so101-pickplace-nf4) |
Apache-2.0 | π0.5 (4.14 B); nf4 fits 8 GB. Pick-place finetune; validated to load + step on so101_box (not insertion-trained — expect drift on the tube task). scenes/sim/so101_tube_insertion.yaml |
3.5 Other platforms
| VLA (HF ID) | Sim env | Robot tag | State dim | Cameras | Norm stats in ckpt | rSkill | License | Notes |
|---|---|---|---|---|---|---|---|---|
nvidia/smolvla-arena-gr1-microwave |
NVIDIA Arena | gr1 |
TBD | TBD | TBD | — | Apache-2.0 | Unitree GR1 humanoid, microwave-opening task |
ISdept/smolvla-piper |
Piper real | piper |
TBD | TBD | TBD | — | Apache-2.0 | Agilex Piper arm; community fine-tune |
4. Sim Environment Reference
| Sim env | Backend | Install | Robot(s) | Task suites | Camera setup |
|---|---|---|---|---|---|
| LIBERO | MuJoCo (robosuite) | CC=/usr/bin/gcc uv sync --group libero + fix ~/.libero/config.yaml to point at conda/pip libero data dirs |
Franka Panda | libero_spatial, libero_object, libero_goal, libero_10 (= LIBERO-Long) | agentview + wrist 256×256; raw keys image/image2 renamed to camera1/camera2 by stored preprocessor; flip 180° |
| MetaWorld | MuJoCo | uv run pip install metaworld==3.0.0 --no-deps |
Sawyer (MT50) | MT50 (50 tasks, v3) | 1 camera corner2 480×480 → resize to 256×256; observation.image key renamed to camera1 |
| RoboCasa | MuJoCo | TBD | Franka Panda | Kitchen manipulation | TBD |
| SO-100 Digital Twin | MuJoCo (in-process, python/sim/) |
uv sync --group sim |
SO-100 | Smoke-test only (no task suite) | None — joint-space smoketest |
SO-101 Box (so101_box) |
MuJoCo (raw, python/sim/src/openral_sim/backends/so101_box/) |
uv sync --group sim |
SO-101 | tube-insertion (geometric success: tube vertical + lower tip ≥ 10 mm below the slotted-block hole top) — both block and tube spawn at random (x, y, yaw) on the floor each reset() |
OAK-D Pro overhead (RGB + depth, default 640×480) + wrist RGB parented to the gripper body |
| NVIDIA Arena | Isaac Sim | Requires NVIDIA Isaac Sim license | GR1 | microwave | TBD |
4.1 LIBERO eval CLI
The lerobot lerobot-eval CLI drives LIBERO natively. Verified against huggingface/lerobot main as of 2026-05-05:
# Single suite
lerobot-eval \
--policy.path=lerobot/smolvla_libero \
--env.type=libero \
--env.task=libero_spatial \
--eval.n_episodes=10 \
--eval.batch_size=10 \
--eval.use_async_envs=true \
--policy.device=cuda
# All four LIBERO suites
lerobot-eval \
--policy.path=lerobot/smolvla_libero \
--env.type=libero \
--env.task=libero_spatial,libero_object,libero_goal,libero_10 \
--eval.n_episodes=10 \
--eval.batch_size=10 \
--eval.use_async_envs=true \
--policy.device=cuda
Suite max steps: libero_spatial 280, libero_object 280, libero_goal 300, libero_10 520.
Note: libero_10 is the lerobot/upstream name for LIBERO-Long. LiberoProcessorStep is injected automatically by lerobot.envs.LiberoEnv — no separate LIBERO gym install is required beyond the lerobot extras.
5. Known Limitations
-
Checkpoint normalisation requires
snapshot_download:lerobot/smolvla_liberobundles normalisation statistics inpolicy_preprocessor_step_5_normalizer_processor.safetensors. A barefrom_pretrainedcall that only fetchesmodel.safetensors+config.jsonwill fail at inference time. Usesnapshot_download(repo_id="lerobot/smolvla_libero")orhf_hub_downloadfor the preprocessor file explicitly. -
GR00T weights — license is version-specific (ADR-0046): GR00T N1 / N1.5 / N1.6 ship under the NVIDIA OneWay Noncommercial License. Any checkpoint that builds on those bases (e.g.,
ar0s/groot_libero) inherits the non-commercial restriction even if the fine-tune layer is Apache-2.0 — the rSkill manifest setslicense: nvidia_non_commercialand the loader requiresOPENRAL_ALLOW_NONCOMMERCIAL=1for a commercial deployment. GR00T N1.7+ ship under the NVIDIA Open Model License, which permits commercial use — those manifests setlicense: nvidia_open_model(e.g.,rskills/gr00t-n17-libero) and load without the guard. GR00T runs out-of-process via a ZMQ sidecar (the runtime adapter lands in ADR-0046 PR2). -
π0 / π0.5 weights are "permissive research", not full Apache-2.0: The code under
lerobot/is Apache-2.0; the weights forpi0andpi05checkpoints carry a Physical Intelligence permissive-research license that is not equivalent to Apache-2.0 for commercial deployment. The corresponding rSkill manifests setcommercial_use_allowed: false. SeeCLAUDE.md §7.4for the full VLA license matrix. -
Reward monitor (
rskills/robometer-4b, ADR-0057) co-residency on 8 GB: The Robometer-4B reward monitor (kind: reward) runs in parallel with a VLA to score per-frame progress/success. At NF4 it is ~3.33 GB resident / 3.56 GB peak (8-frame window) on the 8 GB reference GPU, leaving ~4.4 GB — enough for a small NF4 VLA (e.g. SmolVLA ≈ 1.5–2 GB) but not a 3–4 GB π0.5/GR00T checkpoint simultaneously. When the VLA already saturates the card, place the reward sidecar on CPU, a second GPU, or a cloud host (the ZMQ transport makes location transparent), or shrink the rewardframe_window_s/num_bins(activation peak scales with both). It is an S2-cadence monitor (~0.2–1 Hz over a frame window), not a per-control-step signal, and is advisory-only (never gates motors). Indeploy-sim, the signal is only available on camera-rendering robots (the monitor needssensor_msgs/Imageframes). Apache-2.0; commercially usable. -
MetaWorld, RoboCasa, and most SO-101 community entries are TBD: RoboCasa and SO-101 community entries have not been locally verified. MetaWorld and the four LIBERO entries (smolvla, pi05, xvla, pi0) are now fully verified — see ✓ markers in §3.
-
Isaac Sim entries require a separate license:
edge-inference/smolvla-so101-pick-orangewas trained in NVIDIA Isaac Sim. Reproducing its eval requires an Isaac Sim license and is not covered by the standarduv sync --group simenvironment. -
Embodiment tag
liberoimplies simulation only: Theliberotag is defined for the LIBERO benchmark Franka Panda setup. Do not apply it to real Franka Panda deployments without verifying that action normalisation and camera geometry match your physical setup. -
smolvla_libero state is 8-D, not 6-D: The checkpoint's normalizer safetensors has
observation.statestats for shape [8] (eef_pos(3)+axisangle(3)+gripper_qpos(2)), not [6]. The earlier config.json entry of shape [6] was a documentation error in the checkpoint. Always verify against the safetensors file, not config.json. -
xvla action output is 20-D (padded): xVLA pads actions to
max_state_dim=20. LIBERO's env.step expects 7-D. Sliceaction_np = action_tensor.squeeze(0).cpu().numpy()[:7]to extract the real 7-D action. -
xvla is LIBERO-engine-only: the xVLA adapter's env preprocessor (
LiberoProcessorStep) consumes the nested LiberoEnv observation that the scene must expose asobservation['raw']. Non-LIBERO scenes (e.g. the Isaac Sim Franka scenes) do not populate it, soxvlaraisesROSCapabilityMismatchon the first step. Run xvla only on LIBERO scenes (libero_spatial,franka_libero_pnp, …). -
GR00T / RLDX sidecars have no single-camera fallback: these checkpoints read a fixed number of distinct camera streams positionally — LIBERO=2 (agentview+wrist), RC365=3, GR1/Simpler=1 — set by the manifest's
state_contract.layout. Unlike the in-process lerobot adapters (smolvla / pi05 / act), which resolve their camera list fromscene.camerasand adapt, thegr00t/rldxfactories reject a scene that declares fewer cameras than the layout needs with an upfrontROSCapabilityMismatch(before the multi-minute sidecar boot). A scene that omitscameras:is the adapter default (LIBERO renders camera1+camera2 itself) and is never rejected. Example:gr00t-n17-liberoruns onisaac_franka_bowl_plate(cameras: [camera1, camera2]) but notisaac_franka_lift(cameras: [camera1]). -
π0.5 requires ≥8 GB VRAM: The PaliGemma-3B backbone requires more memory than the 7-class GPU can provide in typical shared use. Use
--device cpufor slow inference or a dedicated A100/H100 for production eval. -
MetaWorld uses Sawyer, not Franka: Despite the
franka_pandaembodiment tag in the lerobot metaworld dataset metadata, MetaWorld MT50 uses the Sawyer arm. The tag refers to the broader manipulation skill class, not the physical robot. Do not use smolvla_metaworld weights on a real Franka without re-training. -
LIBERO
~/.libero/config.yamlmust point at the data files: After installinghf-liberovia pip, the config file at~/.libero/config.yamlpins absolute paths computed at first import and is never refreshed when you switch venv / workspace path. The nextjust sim-libero/just sim-xvla-libero/just sim-pi05-liberorun then crashes insidelerobot.envs.libero.get_task_init_stateswith aFileNotFoundErroron<stale-path>/init_files/<task>.pruned_init. The_ensure-libero-configprivate recipe (chained off every liberojust sim-*target) invokestools/fix_libero_config.pyto detect + rewrite the file when stale; idempotent. Run it manually any time withuv run --group libero python tools/fix_libero_config.py --verbose, or setLIBERO_CONFIG_PATHto a project-local dir to bypass~/.liberoentirely.