CVPR 2026

UniDex: A Robot Foundation Suite for Universal Dexterous Hand Control from Egocentric Human Videos

1Tsinghua University   2Shanghai Qizhi Institute   3Sun Yat-sen University   4The University of North Carolina at Chapel Hill
* Core Contributors    † Project Lead
UniDex teaser figure

Abstract

Dexterous manipulation remains challenging due to the cost of collecting real-robot teleoperation data, the heterogeneity of hand embodiments, and the high dimensionality of control. We present UniDex, a robot foundation suite that couples a large-scale robot-centric dataset with a unified vision-language-action policy and a practical human-data capture setup for universal dexterous hand control. First, we construct UniDex-Dataset, a robot-centric dataset over 50K trajectories across eight dexterous hands derived from egocentric human videos. To transform human data into robot-executable trajectories, we employ a human-in-the-loop retargeting procedure to align fingertip trajectories while preserving plausible hand-object contacts. Second, we introduce FAAS, a unified action space that maps functionally similar actuators to shared coordinates, enabling cross-hand transfer, and train UniDex-VLA, a 3D VLA policy pretrained on UniDex-Dataset and finetuned with task demonstrations. In addition, we build UniDex-Cap, a portable setup for human-robot data co-training that reduces reliance on costly robot demonstrations.

Overview

UniDex-Dataset

UniDex-Dataset focuses on converting large-scale human data into robot data. The key step is human-in-the-loop retargeting: human fingertip trajectories are aligned to robot hands with interactive adjustment so that the resulting robot executions preserve physically plausible contacts. This turns egocentric human manipulation videos into robot-executable dexterous trajectories suitable for large-scale pretraining.

Dataset Video

UniDex dataset video montage as animated gif

8 hands, over 50k trajectories, and 9M paired image-pointcloud-action frames.

Human-in-the-Loop Retargeting

Human-in-the-loop retargeting pipeline
From human data to robot data through retargeting and visual alignment.

Human Data

Robot Data

FAAS + UniDex-VLA

FAAS

FAAS mapping

FAAS maps actuators with the same functional role to shared coordinates, enabling transfer across dexterous hands with different kinematics and DoFs.

UniDex-VLA

UniDex-VLA architecture

UniDex-VLA is a 3D vision-language-action policy that takes pointcloud observations, language instructions, and proprioception, and predicts dexterous action chunks in FAAS.

Tool-Use Tasks

Cut Bags | Wuji Hand

Insert fingers into scissors and cut a chips bag in a human-like grasp. -- 2x speed

Cut Bags | Wuji Hand

The same scissor manipulation policy transfers to another bag variant. -- 2x speed

Use Mouse | Wuji Hand

Place fingers on the mouse, drag the file, and click to finish the desktop task. -- 1x speed

Water Flowers | Wuji Hand

Lift the spray bottle and press the trigger with the thumb to water flowers. -- 1x speed

Make Coffee | Inspire Hand

Grasp the kettle, move to the dripper, and pour water for coffee making. -- 1x speed

Sweep Objects | Inspire Hand

Grasp a sweeper and sweep tabletop objects into the dustpan. -- 2x speed

Generalization

Cross-Hand Transfer | Wuji Hand

A policy trained on Inspire Hand is deployed zero-shot on the higher-DoF Wuji hand. -- 2x speed

Cross-Hand Transfer | Oymotion Hand

The same policy also transfers zero-shot to Oymotion with different kinematics. -- 2x speed

Object Generalization | Inspire Hand

The coffee-making policy transfers to an unseen kettle with different color, size, and geometry. -- 2x speed

UniDex-Cap

UniDex-Cap is a portable human data capture setup that records synchronized RGB-D streams and hand poses, then converts them into robot-executable trajectories through the same transformation pipeline. It can be used to collect human data for human-robot data co-training.

UniDex-Cap setup

BibTeX

@inproceedings{zhang2026unidex,
  title={UniDex: A Robot Foundation Suite for Universal Dexterous Hand Control from Egocentric Human Videos},
  author={Zhang, Gu and Xu, Qicheng and Zhang, Haozhe and Ma, Jianhan and He, Long and Bao, Yiming and Ping, Zeyu and Yuan, Zhecheng and Lu, Chenhao and Yuan, Chengbo and Liang, Tianhai and Tian, Xiaoyu and Shao, Maanping and Zhang, Feihong and Ding, Mingyu and Gao, Yang and Zhao, Hao and Zhao, Hang and Xu, Huazhe},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2026}
}