NeurIPS 2025
Zero-shot reinforcement learning aims to create generalist agents that can adapt to entirely new tasks without retraining—a crucial step toward scalable, autonomous intelligence. However, current methods remain limited by weak expressivity and unstable training.
BREEZE is an FB-based framework that simultaneously enhances learning stability, policy extraction capability, and representation learning quality, through three key designs:
Figure: BREEZE Transformer-like architecture for F (left) and B (right)
We evaluate BREEZE on standard benchmarks including ExORL and D4RL-Franka Kitchen. Our experiments assess zero-shot policy performance, robustness under distribution shifts, and ablation studies on regularization and diffusion components.
Figure:We demonstrate our method results on 4 different environments.
BREEZE consistently achieves top or near-top returns across benchmarks, demonstrating strong zero-shot generalization. It enhances the generalization ability of vanilla FB methods, converges faster, and reaches higher performance with smoother, more stable learning curves.
Figure: BREEZE achieves superior zero-shot performance across tasks.
Figure: BREEZE achieves superior zero-shot performance across tasks.
Our design corrects the distorted \( M^{\pi} \) and \( Q \) distributions of earlier FB frameworks, yielding stable, properly scaled value representations.
Figure: BREEZE enables more realistic distribution learning of M (two left figures) and Q (two right figures) values.
BREEZE unifies diffusion-based policies, transformer encoders, and behavior-regularized learning to achieve stable, expressive zero-shot reinforcement learning. By mitigating extrapolation errors and enhancing policy expressivity, it delivers consistent generalization across tasks.
The main trade-off lies between computational cost versus performance, as diffusion sampling and expressive architectures improve robustness at the expense of efficiency. Future work will focus on reducing this overhead through lighter generative policies and exploring theoretical guarantees for behavior-regularized generalization.
@inproceedings{zheng2025breeze,
title={Towards Robust Zero-Shot Reinforcement Learning},
author={Kexin Zheng and Lauriane Teyssier and Yinan Zheng and Yu Luo and Xianyuan Zhan},
booktitle={NeurIPS},
year={2025}
}