Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning

1Mila 2Université de Montréal 3University of Waterloo 4Google DeepMind *Co first author +Equal advising

Abstract

While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally correlated states are properly encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. We formalize this notion by demonstrating how encouraging long-range temporal consistency via successor representations (SR) can facilitate generalization. We then propose a simple yet effective representation learning objective, BYOL-γ for GCBC, which theoretically approximates the successor representation in the finite MDP case through self-predictive representations, and achieves competitive empirical performance across a suite of challenging tasks requiring combinatorial generalization.

Overview

Test

To learn better policy representations for generalization, we utilize an auxiliary self-predictive objective that predicts a future representation ϕ ( s t + k ) via ψ f ( ϕ ( s t ) ) . We can also predict backwards with a separate predictor ψ b ( ϕ ( s t + k ) ) . The target offset is sampled geometrically k geom ( 1 γ ) .


Representation Visualization

Test

Representation visualizations for different auxiliary losses with BC, indicating cosine similarity between forward prediction ψ from all states the goal's () representation ϕ. .


OGBench Results

OGBench Results
Test

We find that on OGBench tasks requiring combinatorial generalization, using BYOL-γ as an auxiliary objective leads to improvements over BC, and is competitive or better than other representation learning objectives across different tasks.

Horizon Generalization

Large generalization plot
Legend
Increasing goals

We conduct experiments to understand how success rate changes as an agent has to reach more challenging goals further away from its starting position. We consider the same base 5 evaluation tasks used in our main evaluation, but construct intermediate waypoints as shorter horizon goals along the shortest path from the start () to the final goal (). We find that our objective can help generalize to further and more challenging goals.


BibTeX

@misc{lawson2025selfpredictiverepresentationscombinatorialgeneralization,
      title={Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning}, 
      author={Daniel Lawson and Adriana Hugessen and Charlotte Cloutier and Glen Berseth and Khimya Khetarpal},
      year={2025},
      eprint={2506.10137},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2506.10137}, 
}