Dexplore: Scalable Neural Control for Dexterous Manipulation from Reference-Scoped Exploration

1University of Illinois Urbana-Champaign     2NVIDIA
Equal Advising
CoRL 2025


Left: A robot hand discover skills that align with its physical form and human demonstration's intent. Right: A control policy that takes a partial depth image as input, and deployed on a real robot system.


Abstract

Hand-object motion-capture (MoCap) repositories offer large-scale, contact-rich demonstrations and hold promise for scaling dexterous robotic manipulation. Yet demonstration inaccuracies and embodiment gaps between human and robot hands limit the straightforward use of these data. Existing methods adopt a three-stage workflow, including retargeting, tracking, and residual correction, which often leaves demonstrations underused and compound errors across stages. We introduce Dexplore, a unified single-loop optimization that jointly performs retargeting and tracking to learn robot control policies directly from MoCap at scale. Rather than treating demonstrations as ground truth, we use them as soft guidance. From raw trajectories, we derive adaptive spatial scopes, and train with reinforcement learning to keep the policy in-scope while minimizing control effort and accomplishing the task. This unified formulation preserves demonstration intent, enables robot-specific strategies to emerge, improves robustness to noise, and scales to large demonstration corpora. We distill the scaled tracking policy into a vision-based, skill-conditioned generative controller that encodes diverse manipulation skills in a rich latent representation, supporting generalization across objects and real-world deployment. Taken together, these contributions position Dexplore as a principled bridge that transforms imperfect demonstrations into effective training signals for dexterous manipulation.




Demo






Our Retargeting-Free Pipeline vs. Retargeting

Instead of relying on explicit retargeting, Dexplore directly learns from raw MoCap demonstrations.


Original MoCap

AnyTeleop (Retargeting)

Ours (Retargeting-Free)




Comparison with Baselines

We compare Dexplore against DexTrack[1] and AnyTeleop[2] across diverse manipulation tasks.

[1] Liu et al. DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References. ICLR 2025
[2] Qin et al. AnyTeleop: A general vision-based dexterous robot arm-hand teleoperation system. RSS 2023 (retargeting via dex-retargeting)


DexTrack[1]

AnyTeleop[2]

Ours



Adaptable for Different Embodiments

Dexplore is adaptable across different robot hand embodiments, learning manipulation skills that align with each hand's physical form.




Generalization to Unseen Objects

Dexplore generalizes to unseen objects with different sizes and weights (Size × 1.5; Weight × 1.5³).




Vision-Based Policy in Simulation

The distilled vision-based controller takes partial depth images as input. Each row shows: RGB rendering, depth observation, and depth-converted point cloud.


RGB

Depth

Depth Converted Point Cloud




Real-World Experiments

We deploy the vision-based policy on a real robot system with a dexterous hand, demonstrating sim-to-real transfer for diverse manipulation tasks.


BibTeX

@inproceedings{xu2025scalable,
  title = {Dexplore: Scalable Neural Control for Dexterous Manipulation from Reference-Scoped Exploration},
  author = {Xu, Sirui and Chao, Yu-Wei and Bian, Liuyu and Mousavian, Arsalan and Wang, Yu-Xiong and Gui, Liang-Yan and Yang, Wei},
  booktitle = {CoRL},
  year = {2025},
}