Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning

1Mila 2Université de Montréal 3University of Waterloo 4Google DeepMind *Co first author +Equal advising

Abstract

Behavioral cloning (BC) methods trained with supervised learning (SL) are an effective way to learn policies from human demonstrations in domains like robotics. Goal-conditioning these policies enables a single generalist policy to capture diverse behaviors contained within an offline dataset. While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally related states are encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. Hence, encouraging this temporal consistency in the representation space should facilitate combinatorial generalization. Successor representations, which encode the distribution of future states visited from the current state, nicely encapsulate this property. However, previous methods for learning successor representations have relied on contrastive samples, temporal-difference (TD) learning, or both. In this work, we propose a simple yet effective representation learning objective, BYOL-γ augmented GCBC, which is not only able to theoretically approximate the successor representation in the finite MDP case without contrastive samples or TD learning, but also, results in competitive empirical performance across a suite of challenging tasks requiring combinatorial generalization.

Overview

Test

To learn better policy representations for generalization, we utilize an auxiliary self-predictive objective that predicts a future representation ϕ ( s t + k ) via ψ f ( ϕ ( s t ) ) . We can also predict backwards with a separate predictor ψ b ( ϕ ( s t + k ) ) . The target offset is sampled geometrically k geom ( 1 γ ) .


Representation Visualization

Test

Representation visualizations for different auxiliary losses with BC, indicating cosine similarity between forward prediction ψ from all states the goal's () representation ϕ. .


OGBench Results

OGBench Results
Test

We find that on OGBench tasks requiring combinatorial generalization, using BYOL-γ as an auxiliary objective leads to improvements over BC, and is competitive or better than other representation learning objectives across different tasks.

Horizon Generalization

Large generalization plot
Legend
Increasing goals

We conduct experiments to understand how success rate changes as an agent has to reach more challenging goals further away from its starting position. We consider the same base 5 evaluation tasks used in our main evaluation, but construct intermediate waypoints as shorter horizon goals along the shortest path from the start () to the final goal (). We find that our objective can help generalize to further and more challenging goals.


BibTeX

@misc{lawson2025selfpredictiverepresentationscombinatorialgeneralization,
      title={Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning}, 
      author={Daniel Lawson and Adriana Hugessen and Charlotte Cloutier and Glen Berseth and Khimya Khetarpal},
      year={2025},
      eprint={2506.10137},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2506.10137}, 
}