BREEZE

Towards Robust Zero-Shot Reinforcement Learning

NeurIPS 2025

Kexin Zheng ^1*, Lauriane Teyssier ^2*, Yinan Zheng², Yu Luo³, Xianyuan Zhan ^2,4✉,

¹The Chinese University of Hong Kong ²Tsinghua University ³Huawei Noah's Ark Lab ⁴Shanghai Artificial Intelligence Laboratory

^*Equal contribution ^✉Corresponding author

Openreview arXiv Code

BREEZE FRAMEWORK

Zero-shot reinforcement learning aims to create generalist agents that can adapt to entirely new tasks without retraining—a crucial step toward scalable, autonomous intelligence. However, current methods remain limited by weak expressivity and unstable training.

BREEZE is an FB-based framework that simultaneously enhances learning stability, policy extraction capability, and representation learning quality, through three key designs:

Regularization in zero-shot RL policy learning, transforming policy optimization into a stable in-sample learning paradigm.
Diffusion policy extraction, enabling the generation of high-quality and multimodal action distributions in zero-shot RL settings.
Attention-based architectures for representation modeling to capture the complex relationships between environmental dynamics.

Figure: BREEZE Transformer-like architecture for F (left) and B (right)

Experiments

We evaluate BREEZE on standard benchmarks including ExORL and D4RL-Franka Kitchen. Our experiments assess zero-shot policy performance, robustness under distribution shifts, and ablation studies on regularization and diffusion components.

Figure:We demonstrate our method results on 4 different environments.

BREEZE consistently achieves top or near-top returns across benchmarks, demonstrating strong zero-shot generalization. It enhances the generalization ability of vanilla FB methods, converges faster, and reaches higher performance with smoother, more stable learning curves.

Figure: BREEZE achieves superior zero-shot performance across tasks.

Figure: BREEZE achieves superior zero-shot performance across tasks.

Our design corrects the distorted \( M^{\pi} \) and \( Q \) distributions of earlier FB frameworks, yielding stable, properly scaled value representations.

Figure: BREEZE enables more realistic distribution learning of M (two left figures) and Q (two right figures) values.

4. Conclusion & Discussion

BREEZE unifies diffusion-based policies, transformer encoders, and behavior-regularized learning to achieve stable, expressive zero-shot reinforcement learning. By mitigating extrapolation errors and enhancing policy expressivity, it delivers consistent generalization across tasks.

The main trade-off lies between computational cost versus performance, as diffusion sampling and expressive architectures improve robustness at the expense of efficiency. Future work will focus on reducing this overhead through lighter generative policies and exploring theoretical guarantees for behavior-regularized generalization.

@inproceedings{zheng2025breeze, title={Towards Robust Zero-Shot Reinforcement Learning}, author={Kexin Zheng and Lauriane Teyssier and Yinan Zheng and Yu Luo and Xianyuan Zhan}, booktitle={NeurIPS}, year={2025} }

Towards Robust Zero-Shot Reinforcement Learning

BREEZE FRAMEWORK

Experiments

4. Conclusion & Discussion

BibTeX