Incorporating Task Progress Knowledge for Subgoal Generation in Robotic Manipulation through Image Edits

University of Virginia
Interpolate start reference image.

Abstract

Understanding the progress of a task allows humans to not only track what has been done but also to better plan for future goals. We demonstrate TaKSIE, a novel framework that incorporates task progress knowledge into visual subgoal generation for robotic manipulation tasks. We jointly train a recurrent network with a latent diffusion model to generate the next visual subgoal based on the robot's current observation and the input language command. At execution time, the robot leverages a visual progress representation to monitor the task progress and adaptively samples the next visual subgoal from the model to guide the manipulation policy. We train and validate our model in simulated and real-world robotic tasks, achieving state-of-the-art performance on the CALVIN manipulation benchmark. We find that the inclusion of task progress knowledge can improve the robustness of trained policy for different initial robot poses or various movement speeds during demonstrations.

Framework

Interpolate start reference image.

An overview of the TaKSIE, a framework that incorporate task progress knowledge (encoded in the Progress Encoder and Progress Evaluation) into language-conditioned robotic manipulation tasks using generated subgoals as its conditions for a low-level policy.

Keyframe Selection

Interpolate start reference image.

Comparison between TaKSIE ground-truth (GT) subgoal selection strategy (red arrows, two frames on the right) and fixed-interval subgoal selection strategy (black arrows, two frames at the top).

Rollout Examples

BibTeX


    @misc{kang2024incorporatingtaskprogressknowledge,
      title={Incorporating Task Progress Knowledge for Subgoal Generation in Robotic Manipulation through Image Edits}, 
      author={Xuhui Kang and Yen-Ling Kuo},
      year={2024},
      eprint={2410.11013},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2410.11013}, 
    }