_{RLinf: Reinforcement Learning Infrastructure for Embodied and Agentic AI}

_{RLinf: Reinforcement Learning Infrastructure for Embodied and Agentic AI}

RLinf is a flexible and scalable open-source RL infrastructure designed for Embodied and Agentic AI. The 'inf' in RLinf stands for Infrastructure, highlighting its role as a robust backbone for next-generation training. It also stands for Infinite, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.

What's NEW!

[2026/03] 🔥 RLinf supports reinforcement learning fine-tuning for LIBERO-Pro & LIBERO-Plus. Doc: LIBERO-Pro & LIBERO-Plus.
[2026/03] 🔥 RLinf supports DAgger for embodied policies. Doc: DAgger for Embodied Policies.
[2026/03] 🔥 RLinf now supports evaluating and fine-tuning LingBot-VLA within the RoboTwin environment! Doc: LingBot-VLA.
[2026/03] 🔥 RLinf supports FUSCO to accelerate the MoE All-to-All communication used in Megatron. Doc: FUSCO, paper: FUSCO: High-Performance Distributed Data Shuffling via Transformation-Communication Fusion.
[2026/03] 🔥 RLinf supports reinforcement learning on multiagents. Website: WideSeek-R1, quickstart: QuickStart, paper: WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning.
[2026/03] 🔥 RLinf supports real-world RL with XSquare Turtle2 dual-arm robot. Doc: RL on XSquare Turtle2 in the RealWorld.
[2026/02] 🔥 RLinf supports supervised fine-tuning of Vision-Language Models. Doc: VLM SFT.
[2026/02] 🔥 RLinf supports DSRL (Diffusion Steering via Reinforcement Learning) for Pi0, which steers a pre-trained diffusion policy by training a lightweight SAC agent in the latent noise space. Doc: DSRL for Pi0.
[2026/02] 🔥 RLinf supports agentic reinforcement learning on rStar2. Doc: rStar2.
[2026/02] 🔥 RLinf supports sim-real co-training for π₀ and π₀.₅. Doc: Sim-Real Co-Training.
[2026/02] 🔥 RLinf officially supports world-model-based reinforcement learning fine-tuning for VLA. Doc: WoVR, paper: WoVR: World Models as Reliable Simulators for Post-Training VLA Policies with RL.
[2026/02] 🔥 RLinf supports reinforcement learning fine-tuning for VLA based on Wan World Model. Doc: RL on Wan World Model.
[2026/02] 🔥 RLinf is now available on PyPI for installation via pip as a library. Doc: Installation as a Library.
[2026/02] 🔥 The Technical Report of our realworld online learning system RLinf-USER: A Unified and Extensible System for Real-World Online Policy Learning in Embodied AI is released. Doc: RLinf-USER.
[2026/02] 🔥 RLinf supports reinforcement learning fine-tuning for Dexbotic. Doc: RL on Dexbotic Model.
[2026/02] 🔥 RLinf supports reinforcement learning with GSEnv for Real2Sim2Real. Doc: RL with GSEnv.
[2026/01] 🔥 RLinf supports reinforcement learning fine-tuning for OpenSora World Model. Doc: RL on OpenSora World Model.
[2026/01] 🔥 RLinf supports reinforcement learning fine-tuning for RoboTwin. Doc: RL on RoboTwin.
[2026/01] 🔥 RLinf supports SAC training for flow matching policy. Doc: SAC-Flow, paper: SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling.

More updates

[2025/12] 🔥 RLinf supports agentic reinforcement learning on Search-R1. Doc: Search-R1.
[2025/12] 🔥 RLinf v0.2-pre is open-sourced. We support real-world RL with Franka. Doc: RL on Franka in the RealWorld.
[2025/12] 🔥 RLinf supports reinforcement learning fine-tuning for RoboCasa. Doc: RL on Robocasa.
[2025/12] 🎉 RLinf official release of v0.1.
[2025/11] 🔥 RLinf supports reinforcement learning fine-tuning for CALVIN. Doc: RL on CALVIN.
[2025/11] 🔥 RLinf supports reinforcement learning fine-tuning for IsaacLab. Doc: RL on IsaacLab.
[2025/11] 🔥 RLinf supports reinforcement learning fine-tuning for GR00T-N1.5. Doc: RL on GR00T-N1.5.
[2025/11] 🔥 RLinf supports reinforcement learning fine-tuning for Metaworld. Doc: RL on Metaworld.
[2025/11] 🔥 RLinf supports reinforcement learning fine-tuning for Behavior 1k. Doc: RL on Behavior 1k.
[2025/11] Add lora support to π₀ and π₀.₅.
[2025/10] 🔥 RLinf supports reinforcement learning fine-tuning for π₀ and π₀.₅! Doc: RL on π₀ and π₀.₅ Models, paper: RL fine-tuning for π₀ and π₀.₅ technical report. The report on πRL by Machine Heart and RoboTech are also released.
[2025/10] 🔥 RLinf now officially supports online reinforcement learning! Doc: coding_online_rl, and the report The first open-source agent online RL framework RLinf-Online is also published.
[2025/10] 🔥 The RLinf algorithm technical report is officially released. Doc: RLinf-VLA, paper: RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training.
[2025/09] 🔥 Our paper RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation is officially released. Doc: RLinf, and the Machine Heart report on RLinf is also published.
[2025/08] RLinf is open-sourced. The formal v0.1 will be released soon.

Key Features

RLinf has high flexibility to support diverse RL training workflows (PPO, GRPO, SAC and so on), while hiding the complexity of distributed programming. Users can easily scale RL training to a large number of GPU nodes without modifying code, meeting the increasing demand of computation for RL training.

The high flexibility allows RLinf to explore more efficient scheduling and execution. The hybrid execution mode for embodied RL achieves up to 2.434× throughput compared to existing frameworks.

Multiple Backend Integrations

FSDP + HuggingFace/SGLang/vLLM: rapid adaptation to new models and algorithms, ideal for beginners and fast prototyping.
Megatron + SGLang/vLLM: optimized for large-scale training, delivering maximum efficiency for expert users with demanding workloads.

Examples

Embodied AI

Simulators	Real-world Robotics	Models	Algorithms
ManiSkill ✅ LIBERO ✅ LIBERO-Pro & LIBERO-Plus ✅ RoboTwin ✅ RoboVerse BEHAVIOR ✅ MetaWorld ✅ IsaacLab ✅ CALVIN ✅ RoboCasa ✅ Franka-Sim ✅ More...	Franka Arm ✅ XSquare Turtle2 ✅ More...	VLA π₀ ✅ π₀.₅ ✅ OpenVLA ✅ LingBot-VLA ✅ OpenVLA-OFT ✅ GR00T ✅ Dexbotic ✅ VLM Qwen2.5-VL ✅ Qwen3-VL ✅ World Model OpenSora ✅ Wan ✅ Custom Models MLP-Policy ✅ CNN-Policy ✅	RL Algos GRPO ✅ PPO ✅ DAPO ✅ Reinforce++ ✅ SAC ✅ CrossQ ✅ RLPD ✅ SAC-Flow ✅ DSRL ✅ SFT Full-parameter SFT ✅ LoRA SFT ✅ VLM SFT ✅ DAgger ✅

Agentic AI

Single-Agent	Multi-Agent
SearchR1 ✅ rStar2 ✅ Online Coder ✅ Math Reasoning RL ✅	WideSeek-R1 ✅

Quick Start

Installation: Users can refer to our installation guide to install RLinf. We recommend users to use our provided docker image (i.e., Installation Method 1), as the environment and dependencies of embodied RL are complex.

Run a simple example: After setting up the environment, users can run a simple example of embodied RL with ManiSkill3 simulator following this document.

SOTA RL Training Reproduction: RLinf provides end-to-end recipes that reproduce or match state-of-the-art (SOTA) RL results out of the box—users can directly run our configs and scripts to obtain SOTA performance without custom engineering. Check out our example gallery for more details.

Awesome Community Projects with RLinf

We are excited to see a growing ecosystem of projects building on top of or integrate with RLinf, spanning embodied AI, robotics, and long-horizon agentic systems. Here are some awesome community projects:

i4h-workflows: NVIDIA team open sourced RL-based workflow built on Isaac ecosystem, integrating RLinf for healthcare-oriented embodied intelligence.
pi-StepNFT: Extends RLinf for step-level training and optimization of π-series VLA models.
Dexbotic: A robotics + RL system integrating RLinf for scalable training and deployment of embodied agents.
RoboTwin: A digital twin + robotics platform leveraging RLinf for large-scale embodied RL training.
IsaacLab: Official integration of RLinf within IsaacLab, enabling seamless reinforcement learning workflows on top of NVIDIA Isaac Sim based robotics environments.

💡 Want to feature your project here? Open a PR and we’ll be happy to include it!

Adoption

RLinf is a production-grade, open-source reinforcement learning framework for embodied AI. It is being adopted by leading companies and startups across AI infrastructure and robotics, including AgiBot, X Square Robot, PsiBot, Dexmal, Moore Threads, and D-Robotics.

✨ If your organization is using RLinf, feel free to reach out or submit a PR to be listed here.

CI Test Status

RLinf has comprehensive CI tests for both the core components (via unit tests) and end-to-end RL training workflows of embodied, agent, and reasoning scenarios. Below is the summary of the CI test status of the main branch:

Test Name	Status
unit-tests
agent-reason-e2e-tests
embodied-e2e-tests
scheduler-tests

Contribution Guidelines

We welcome contributions to RLinf. Please read contribution guide before taking action. Thank the following contributors and welcome more developers to join us on this open source project.

Citation and Acknowledgement

If you find RLinf helpful, please cite the paper:

@article{yu2025rlinf,
  title={RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation},
  author={Yu, Chao and Wang, Yuanqing and Guo, Zhen and Lin, Hao and Xu, Si and Zang, Hongzhi and Zhang, Quanlu and Wu, Yongji and Zhu, Chunyang and Hu, Junhao and others},
  journal={arXiv preprint arXiv:2509.15965},
  year={2025}
}

If you use RL+VLA in RLinf, you can also cite our technical report and empirical study paper:

@article{zang2025rlinf,
  title={RLinf-VLA: A Unified and Efficient Framework for VLA+ RL Training},
  author={Zang, Hongzhi and Wei, Mingjie and Xu, Si and Wu, Yongji and Guo, Zhen and Wang, Yuanqing and Lin, Hao and Shi, Liangzhi and Xie, Yuqing and Xu, Zhexuan and others},
  journal={arXiv preprint arXiv:2510.06710},
  year={2025}
}

@article{liu2025can,
  title={What can rl bring to vla generalization? an empirical study},
  author={Liu, Jijia and Gao, Feng and Wei, Bingwen and Chen, Xinlei and Liao, Qingmin and Wu, Yi and Yu, Chao and Wang, Yu},
  journal={arXiv preprint arXiv:2505.19789},
  year={2025}
}

@article{chen2025pi_,
  title={$$\backslash$pi\_$\backslash$texttt $\{$RL$\}$ $: Online RL Fine-tuning for Flow-based Vision-Language-Action Models},
  author={Chen, Kang and Liu, Zhihao and Zhang, Tonghe and Guo, Zhen and Xu, Si and Lin, Hao and Zang, Hongzhi and Zhang, Quanlu and Yu, Zhaofei and Fan, Guoliang and others},
  journal={arXiv preprint arXiv:2510.25889},
  year={2025}
}

If you train your policies in physical world with RLinf, you can cite our paper:

@article{zang2026rlinfuser,
  title={RLinf-USER: A Unified and Extensible System for Real-World Online Policy Learning in Embodied AI}, 
  author={Hongzhi Zang and Shu'ang Yu and Hao Lin and Tianxing Zhou and Zefang Huang and Zhen Guo and Xin Xu and Jiakai Zhou and Yuze Sheng and Shizhe Zhang and Feng Gao and Wenhao Tang and Yufeng Yue and Quanlu Zhang and Xinlei Chen and Chao Yu and Yu Wang},
  year={2026},
  journal={arXiv preprint arXiv:2602.07837},
  url={https://arxiv.org/abs/2602.07837}, 
}

If you use World Model + VLA + RL in RLinf, you can cite our paper:

@article{jiang2026wovr,
  title={WoVR: World Models as Reliable Simulators for Post-Training VLA Policies with RL}, 
  author={Zhennan Jiang and Shangqing Zhou and Yutong Jiang and Zefang Huang and Mingjie Wei and Yuhui Chen and Tianxing Zhou and Zhen Guo and Hao Lin and Quanlu Zhang and Yu Wang and Haoran Li and Chao Yu and Dongbin Zhao},
  year={2026},
  journal={arXiv preprint arXiv:2602.13977},
  url={https://arxiv.org/abs/2602.13977}, 
}

If you use RL-based sim-real co-training in RLinf, you can cite our paper:

@article{shi2026rlinf,
  title={Beyond Imitation: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models},
  author={Shi, Liangzhi and Chen, Shuaihang and Gao, Feng and Chen, Yinuo and Chen, Kang and Zhang, Tonghe and Zhang, Hongzhi and Zhang, Weinan and Yu, Chao and Wang, Yu},
  journal={arXiv preprint arXiv:2602.12628},
  year={2026},
  url={https://arxiv.org/abs/2602.12628},
}

If you use WideSeek-R1 in RLinf, you can cite our paper:

@article{xu2026wideseek,
  title={WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning},
  author={Xu, Zelai and Xu, Zhexuan and Zhang, Ruize and Zhu, Chunyang and Yu, Shi and Liu, Weilin and Zhang, Quanlu and Ding, Wenbo and Yu, Chao and Wang, Yu},
  journal={arXiv preprint arXiv:2602.04634},
  year={2026},
}

Acknowledgements RLinf has been inspired by, and benefits from, the ideas and tooling of the broader open-source community. In particular, we would like to thank the teams and contributors behind VeRL, AReaL, Megatron-LM, SGLang, and PyTorch Fully Sharded Data Parallel (FSDP), and if we have inadvertently missed your project or contribution, please open an issue or a pull request so we can properly credit you.

Contact: We welcome applications from Postdocs, PhD/Master's students, and interns. Join us in shaping the future of RL infrastructure and embodied AI!

Chao Yu: zoeyuchao@gmail.com
Yu Wang: yu-wang@tsinghua.edu.cn

Name		Name	Last commit message	Last commit date
Latest commit History 422 Commits
.cursor/skills		.cursor/skills
.github		.github
docker		docker
docs		docs
examples		examples
ray_utils		ray_utils
requirements		requirements
rlinf		rlinf
tests		tests
toolkits		toolkits
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

_{RLinf: Reinforcement Learning Infrastructure for Embodied and Agentic AI}

What's NEW!

Key Features

Examples

Embodied AI

Agentic AI

Quick Start

Awesome Community Projects with RLinf

Adoption

CI Test Status

Contribution Guidelines

Citation and Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RLinf: Reinforcement Learning Infrastructure for Embodied and Agentic AI

What's NEW!

Key Features

Examples

Embodied AI

Agentic AI

Quick Start

Awesome Community Projects with RLinf

Adoption

CI Test Status

Contribution Guidelines

Citation and Acknowledgement

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

_{RLinf: Reinforcement Learning Infrastructure for Embodied and Agentic AI}

Packages