Diverse Human Motion Prediction Guided by Multi-Level Spatial-Temporal Anchors




Abstract

Predicting diverse human motions given a sequence of historical poses has received increasing attention. Despite rapid progress, existing work captures the multi-modal nature of human motions primarily through likelihood-based sampling, where the mode collapse has been widely observed. In this paper, we propose a simple yet effective approach that disentangles randomly sampled codes with a deterministic learnable component named anchors to promote sample precision and diversity. Anchors are further factorized into spatial anchors and temporal anchors, which provide attractively interpretable control over spatial-temporal disparity. In principle, our spatial-temporal anchor-based sampling (STARS) can be applied to different motion predictors. Here we propose an interaction-enhanced spatial-temporal graph convolutional network (IE-STGCN) that encodes prior knowledge of human motions (e.g., spatial locality), and incorporate the anchors into it. Extensive experiments demonstrate that our approach outperforms state of the art in both stochastic and deterministic prediction, suggesting it as a unified framework for modeling human motions.



Diverse Predictions

Deterministic Predictions



Talk



Paper


Diverse Human Motion Prediction Guided by Multi-Level Spatial-Temporal Anchors
Sirui Xu, Yu-Xiong Wang*, Liang-Yan Gui*
ECCV 2022 (Oral Presentation)

BibTex
@inproceedings{xu22stars,
  title     = {Diverse Human Motion Prediction Guided by Multi-Level Spatial-Temporal Anchors},
  author    = {Xu, Sirui and Wang, Yu-Xiong and Gui, Liang-Yan},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year      = {2022}
}