Inference-Time Enhancement of Generative Robot Policies via Predictive World Modeling

Accepted to IEEE Robotics and Automation Letters (RA-L)

Previously titled: Strengthening Generative Robot Policies through Predictive World Modeling

1School of Engineering and Applied Sciences, Harvard University

Abstract

Abstract—We present generative predictive control (GPC), a framework for inference-time enhancement of pretrained behavior-cloning policies. Rather than retraining or fine-tuning, GPC augments a frozen diffusion policy at deployment by coupling it with a predictive world model. Concretely, we train an action-conditioned world model on expert demonstrations and random exploration rollouts to forecast the consequences of action proposals produced by the diffusion policy, then perform lightweight online planning that ranks and refines these proposals via model-based look-ahead. This combination of a generative prior with predictive foresight enables test-time adaptation. Across diverse robotic manipulation tasks—state- and vision-based, in simulation and on real hardware—GPC consistently outperforms standard behavior cloning and compares favorably to other inference-time adaptation baselines.

Keywords: world model; predictive world modeling; diffusion policy; behavior cloning; inference-time adaptation; model-based planning; online planning.

Simulation Evaluation

Simulation evaluation

World Model Prediction

Plain Push-T

World Model Prediction in GPC-RANK.

World Model Prediction in GPC-OPT.

Push-T collided with A

World Model Prediction in GPC-RANK.

World Model Prediction in GPC-OPT.

Push-T collided with A & R

World Model Prediction in GPC-RANK.

World Model Prediction in GPC-OPT.

Real-world Evaluation

Check all real-world evalutaion results (by clicking the titles)

Plain Push-T: Baseline (5 out of 10) vs. GPC-RANK (7 out of 10) vs. GPC-OPT (7 out of 10)

Baseline Test 0: Success.

GPC-RANK Test 0: Success.

GPC-OPT Test 0: Success.

Baseline Test 1: Failure.

GPC-RANK Test 1: Success.

GPC-OPT Test 1: Success.

Baseline Test 2: Success.

GPC-RANK Test 2: Success.

GPC-OPT Test 2: Success.

Baseline Test 3: Failure.

GPC-RANK Test 3: Failure.

GPC-OPT Test 3: Success.

Baseline Test 4: Failure.

GPC-RANK Test 4: Success.

GPC-OPT Test 4: Success.

Baseline Test 5: Failure.

GPC-RANK Test 5: Success.

GPC-OPT Test 5: Failure.

Baseline Test 6: Success.

GPC-RANK Test 6: Failure.

GPC-OPT Test 6: Failure.

Baseline Test 7: Failure.

GPC-RANK Test 7: Failure.

GPC-OPT Test 7: Success.

Baseline Test 8: Success.

GPC-RANK Test 8: Success.

GPC-OPT Test 8: Success.

Baseline Test 9: Success.

GPC-RANK Test 9: Success.

GPC-OPT Test 9: Failure.

Push-T collided with A: Baseline (2 out of 5) vs. GPC-RANK (4 out of 5) vs. GPC-OPT (3 out of 5)

Baseline Test 0: Failure.

GPC-RANK Test 0: Success.

GPC-OPT Test 0: Success.

Baseline Test 1: Success.

GPC-RANK Test 1: Success.

GPC-OPT Test 1: Success.

Baseline Test 2: Failure.

GPC-RANK Test 2: Success.

GPC-OPT Test 2: Failure.

Baseline Test 3: Failure.

GPC-RANK Test 3: Failure.

GPC-OPT Test 3: Failure.

Baseline Test 4: Success.

GPC-RANK Test 4: Success.

GPC-OPT Test 4: Success.

Push-T collided with A & R: Baseline (2 out of 5) vs. GPC-RANK (3 out of 5) vs. GPC-OPT (4 out of 5)

Baseline Test 0: Success.

GPC-RANK Test 0: Success.

GPC-OPT Test 0: Success.

Baseline Test 1: Success.

GPC-RANK Test 1: Success.

GPC-OPT Test 1: Success.

Baseline Test 2: Failure.

GPC-RANK Test 2: Success.

GPC-OPT Test 2: Success.

Baseline Test 3: Failure.

GPC-RANK Test 3: Failure.

GPC-OPT Test 3: Failure.

Baseline Test 4: Failure.

GPC-RANK Test 4: Failure.

GPC-OPT Test 4: Success.

Push-T collided with R: Baseline (2 out of 3) vs. GPC-RANK (3 out of 3) vs. GPC-OPT (3 out of 3)

Baseline Test 0: Success.

GPC-RANK Test 0: Success.

GPC-OPT Test 0: Success.

Baseline Test 1: Success.

GPC-RANK Test 1: Success.

GPC-OPT Test 1: Success.

Baseline Test 2: Failure.

GPC-RANK Test 2: Success.

GPC-OPT Test 2: Success.

Clothes Folding: Baseline (3 out of 10) vs. GPC-RANK (7 out of 10)

Baseline Test 0: Failure.

GPC-RANK Test 0: Success.

Baseline Test 1: Success.

GPC-RANK Test 1: Success.

Baseline Test 2: Failure.

GPC-RANK Test 2: Success.

Baseline Test 3: Failure.

GPC-RANK Test 3: Success.

Baseline Test 4: Failure.

GPC-RANK Test 4: Failure.

Baseline Test 5: Failure.

GPC-RANK Test 5: Success.

Baseline Test 6: Success.

GPC-RANK Test 6: Success.

Baseline Test 7: Success.

GPC-RANK Test 7: Success.

Baseline Test 8: Failure.

GPC-RANK Test 8: Failure.

Baseline Test 9: Failure.

GPC-RANK Test 9: Failure.

BibTeX

  @article{qi25ral-gpc,
    title={Inference-Time Enhancement of Generative Robot Policies via Predictive World Modeling},
    note={Previously titled: Strengthening Generative Robot Policies through Predictive World Modeling},
    author={Qi, Han and Yin, Haocheng and Zhu, Aris and Du, Yilun and Yang, Heng},
    journal={IEEE Robotics and Automation Letters (RAL)},
    year={2025}
  }