Website | Paper | Benchmark Repo | Dataset | Models | Leaderboard
🚀 Join Our Community: WeChat Group | Discord
- Updates
- Installation
- Repository Structure
- Download
- Model Training
- Evaluation
- Troubleshooting
- Acknowledgement
- Citation
- [03/2026] 🚀 We release MME-VLA Suite, a family of memory-augmented vision-language-action (VLA) models based on the
$\pi_{0.5}$ backbone. See our paper and leaderboard for more details and analysis.
GIT_LFS_SKIP_SMUDGE=1 uv sync
GIT_LFS_SKIP_SMUDGE=1 uv pip install -e .
Set the OPENPI_DATA_HOME path in your ~/.bashrc, e.g. export OPENPI_DATA_HOME=<your_openpi_homedir>. For more details, please refer to OpenPi.
Clone the RoboMME submodule:
git submodule update --init
Then install the RoboMME environment following the documentation here. We use separate environments for VLA training/inference and the RoboMME simulator. During evaluation, we use a WebSocket connection between them, following OpenPi.
After downloading the data in the data directory and setting up runs in the following structure.
Update the RoboMME submodule with git submodule update --init.
Then build the Docker image following this.
.
├── data
│ ├── robomme_h5_data # download robomme raw h5 files here
│ └── robomme_preprocessed_data
│ │ ├── data # pickle files
│ │ ├── features # precompute siglip token embeddings
│ │ ├── meta # statistics for robomme
│ │ ├── memer # VLM subgoal training data for MemER
│ │ └── qwenvl # VLM subgoal training data for QwenVL
├── examples
│ └── robomme # RoboMME simulator evaluation code
├── packages
│ └── openpi-client # VLA client & server interface
├── runs
│ ├── assets # save norm_stats json files
│ ├── ckpts # fine-tuned checkpoints
│ └── evaluation # evaluation results
├── scripts # train/eval/data_generation scripts
├── src
│ ├── mme_vla_suite # MME_VLA code, follows openpi structure
│ └── openpi # original OpenPi code with minor changes
└── third_party
This repository is built on top of OpenPi. We highly recommend becoming familiar with OpenPi first before working with this repo.
Place all data under the data directory:
mkdir data && cd data
Download the raw RoboMME training files here:
git clone git@hf.co:Yinpei/robomme_data_h5 data/robomme_data_h5
(Optional) Download preprocessed RoboMME data here:
git clone git@hf.co:datasets/Yinpei/robomme_preprocessed_data data/robomme_preprocessed_data
and run uv run scripts/unzip_data.py data/robomme_preprocessed_data to unzip the files.
Alternatively, you can run uv run scripts/build_dataset.py to generate the preprocessed pickle files (takes about 2–3 hours) and/or the VLM subgoal predictor training data (takes about 30–60 minutes).
We also provide data in the LeRobot format here. In our experiments, however, the LeRobot dataloader significantly increased CPU memory usage during training, which can be a bottleneck in shared training environments (e.g., on HPC clusters). For this reason, we use our custom data format and dataloader in this repository.
Download the
uv run scripts/download_pi05_base.py
Download the pi05_vision_encoder, which is a subset of the
cd $OPENPI_DATA_HOME
git clone git@hf.co:Yinpei/pi05_vision_encoder
Fine-tuned models and evaluation results are stored under the runs directory. Create it if needed:
mkdir runs
mkdir runs/ckpts # save all trained models here
mkdir runs/evaluation # evaluation results
mkdir runs/assets # save all normalization statistics files here
You can skip the following steps if you plan to fine-tune your own VLA/VLM models directly; see Model Training.
Download MME-VLA variants here:
git clone git@hf.co:Yinpei/mme_vla_suite runs/ckpts/mme_vla_suite
We release all checkpoints for symbolic and perceptual memory, and a subset of recurrent memory variants for research. Recurrent memory is still underperforming; we will release more recurrent variants as results improve.
Download VLM subgoal predictors here:
git clone git@hf.co:Yinpei/vlm_subgoal_predictor runs/ckpts/vlm_subgoal_predictor
Download the fine-tuned
git clone git@hf.co:Yinpei/pi05_baseline runs/ckpts/pi05_baseline
After downloading fine-tuned checkpoints, you can run
uv run ./scripts/unzip_ckpt.py runs/ckpts
to unzip all of them.
Prepare training data by either downloading preprocessed files or running:
uv run scripts/build_robomme_dataset.py --dataset_type robomme_pkl --raw_data_path=<downloaded_h5_data_dir> --preprocessed_data_path=<your_target_dir>
Then compute normalization statistics (this takes about 3 minutes):
uv run scripts/compute_norm_stats.py --config-name mme_vla_suite --repo-id robomme --dataset-path="data/robomme_preprocessed_data"
uv run scripts/compute_norm_stats.py --config-name pi05_baseline --repo-id robomme --dataset-path="data/robomme_preprocessed_data"
This produces the following structure under runs:
.
├── assets
│ ├── mme_vla_suite
│ │ └── robomme
│ │ └── norm_stats.json
│ └── pi05_baseline
│ └── robomme
│ └── norm_stats.json
You can also compare against our reference norm_stats.json provided here to check whether your processing is correct. Small differences are acceptable.
This variant does not use history and fine-tunes the
bash scripts/finetune_pi05_baseline.sh
You can change --exp-name to suit your own experiment naming.
bash scripts/finetune_mme_vla_suite.sh
Set MME_VLA_TYPE to train a specific model variant. You can also change --exp-name to suit your own experiment naming.
robomme_preprocessed_data already contains VLM subgoal prediction data, but you can also generate it with:
uv run scripts/build_robomme_dataset.py --dataset_type vlm_subgoal_qwenvl --raw_data_path=<downloaded_h5_data_dir> --preprocessed_data_path=<your_target_dir>
uv run scripts/build_robomme_dataset.py --dataset_type vlm_subgoal_memer --raw_data_path=<downloaded_h5_data_dir> --preprocessed_data_path=<your_target_dir>
After the data is ready, run:
micromamba activate robomme
bash scripts/finetune_vlm_subgoal_predictor.sh
Set DATASET_PATH according to which VLM you are training: (1) simple subgoals, (2) grounded subgoals, or (3) MemER-style subgoals.
After downloading the fine-tuned checkpoints, run:
bash scripts/eval.sh
Set the MODEL_TYPE variable to one of the following:
- Prior methods:
pi05_baseline,MemER - Symbolic MME-VLA:
symbolic_simpleSG_oracle,symbolic_simpleSG_gemini,symbolic_simpleSG_qwenvl,symbolic_groundedSG_oracle,symbolic_groundedSG_gemini,symbolic_groundedSG_qwenvl - Perceptual MME-VLA:
perceptual-framesamp-context,perceptual-framesamp-modul,perceptual-framesamp-expert,perceptual-tokendrop-context,perceptual-tokendrop-modul,perceptual-tokendrop-expert - Recurrent MME-VLA:
recurrent-rmt-context,recurrent-rmt-modul,recurrent-rmt-expert,recurrent-ttt-context,recurrent-ttt-modul,recurrent-ttt-expert
Running eval.sh automatically starts two tmux windows: one for the policy server and one for RoboMME evaluation. If the evaluation is interrupted, you can rerun the script; it will automatically resume from the generated progress.json.
Details are provided here.
Q1: Vulkan installation fails.
A1: Please refer to the ManiSkill solution. If it still does not work, we recommend reinstalling the NVIDIA driver and Vulkan packages. We use NVIDIA driver 570.211.01 and Vulkan 1.3.275. You can also switch to CPU rendering:
os.environ['SAPIEN_RENDER_DEVICE'] = 'cpu'
os.environ['MUJOCO_GL'] = 'osmesa'
Q2: Why does the evaluation stop?
A2: We observed that, on long-horizon tasks such as VideoPlaceButton, the WebSocket connection can break due to large video frames. If the evaluation process is interrupted, you can rerun scripts/eval.sh, and the program will resume based on the generated progress.json.
Q3: CUDA runs out of memory when training VLA models.
A3: You can set the environment variable XLA_PYTHON_CLIENT_MEM_FRACTION=0.95 to allow JAX to use more GPU memory.
This work was supported in part by NSF SES-2128623, NSF CAREER #2337870, NSF NRI #2220876, NSF NAIRR250085, and NSF IIS-1949634. We would also like to thank the excellent OpenPi codebase from Physical-Intelligence.
@article{dai2026robomme,
title={RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies},
author={Dai, Yinpei and Fu, Hongze and Lee, Jayjun and and Liu, Yuejiang and Zhang, Haoran and Yang, Jianing and Finn, Chelsea and Fazeli, Nima and Chai, Joyce},
journal={arXiv preprint arXiv:2603.04639},
year={2026}
}
