InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation

Sirui Xu,   Dongting Li,   Yucheng Zhang,   Xiyan Xu,   Qi Long,   Ziyin Wang,   Yunzhi Lu,   Shuchang Dong,   Hezi Jiang,   Akshat Gupta,   Yu-Xiong Wang,    Liang-Yan Gui,
University of Illinois Urbana-Champaign
CVPR 2025


Abstract

While large-scale human motion capture datasets have advanced human motion generation, modeling and generating dynamic 3D human-object interactions (HOIs) remain challenging due to dataset limitations. Existing datasets often lack extensive, high-quality motion and annotation and exhibit artifacts such as contact penetration, floating, and incorrect hand motions. To address these issues, we introduce InterAct, a large-scale 3D HOI benchmark featuring dataset and methodological advancements. First, we consolidate and standardize 21.81 hours of HOI data from diverse sources, enriching it with detailed textual annotations. Second, we propose a unified optimization framework to enhance data quality by reducing artifacts and correcting hand motions. Leveraging the principle of contact invariance, we maintain human-object relationships while introducing motion variations, expanding the dataset to 30.70 hours. Third, we define six benchmarking tasks and develop a unified HOI generative modeling perspective, achieving state-of-the-art performance. Extensive experiments validate the utility of our dataset as a foundational resource for advancing 3D human-object interaction generation. To support continued research in this area, the dataset is publicly available at https://github.com/wzyabcas/InterAct, and will be actively maintained.




Interaction Augmentation


Before Augmentation

After Augmentation

Before Augmentation

After Augmentation



Interaction Correction


Hand Correction


Before Correction

After Correction

Before Correction

After Correction


Full-Body Correction


Before Correction

After Correction

Before Correction

After Correction




Text-Conditioned Interaction Generation


HOI-Diff

InterAct (Ours)

HOI-Diff

InterAct (Ours)

Text: A person crosses and uncrosses their left leg over their right while seated.

Text: Kick the base of the floorlamp, and set it back down.

Ablation Study on Different Human Representations

SMPL

Joints

Markers (Ours)

Text: An individual positions themselves atop the square table, looking upwards while their feet dangle above the ground.

Visualization of Markers

Text: Kick the base of the floorlamp, and set it back down.

Text: Lift the plasticbox, move the plasticbox, and put down the plasticbox.

Text: Lift the tripod, move the tripod, and put down the tripod.

Text: Lift the largetable above your head, spin it and put the largetable down.


Action-Conditioned Interaction Generation


HOI-Diff

InterAct (Ours)

HOI-Diff

InterAct (Ours)

Action: Carry

Action: Lift




Object-Conditioned Human Generation


Objects are unseen during training.

1-stage

OMOMO

InterAct (Ours)



Human-Conditioned Object Generation


1-stage

InterAct (Ours)

1-stage

InterAct (Ours)




Interaction Prediction


Generalization across diverse motions and objects


GRAB

BEHAVE

InterCap

HODome



CHAIRS

OMOMO

IMHD


Interaction Imitation