CVPR 2026

UniDex: A Robot Foundation Suite for Universal Dexterous Hand Control from Egocentric Human Videos

1Tsinghua University   2Shanghai Qizhi Institute   3Sun Yat-sen University   4The University of North Carolina at Chapel Hill
* Core Contributors    † Project Lead
UniDex teaser figure

Abstract

Dexterous manipulation remains challenging due to the cost of collecting real-robot teleoperation data, the heterogeneity of hand embodiments, and the high dimensionality of control. We present UniDex, a robot foundation suite that couples a large-scale robot-centric dataset with a unified vision-language-action policy and a practical human-data capture setup for universal dexterous hand control. First, we construct UniDex-Dataset, a robot-centric dataset over 50K trajectories across eight dexterous hands derived from egocentric human videos. To transform human data into robot-executable trajectories, we employ a human-in-the-loop retargeting procedure to align fingertip trajectories while preserving plausible hand-object contacts. Second, we introduce FAAS, a unified action space that maps functionally similar actuators to shared coordinates, enabling cross-hand transfer, and train UniDex-VLA, a 3D VLA policy pretrained on UniDex-Dataset and finetuned with task demonstrations. In addition, we build UniDex-Cap, a portable setup for human-robot data co-training that reduces reliance on costly robot demonstrations.

Overview

UniDex-Dataset

UniDex-Dataset focuses on converting large-scale human data into robot data. The key step is human-in-the-loop retargeting: human fingertip trajectories are aligned to robot hands with interactive adjustment so that the resulting robot executions preserve physically plausible contacts. This turns egocentric human manipulation videos into robot-executable dexterous trajectories suitable for large-scale pretraining.

Dataset Video

UniDex dataset video montage as animated gif

8 hands, over 50k trajectories, and 9M paired image-pointcloud-action frames.

Human-in-the-Loop Retargeting

Human-in-the-loop retargeting pipeline
From human data to robot data through retargeting and visual alignment.

Human Data

Robot Data

FAAS + UniDex-VLA

FAAS

FAAS mapping

FAAS maps actuators with the same functional role to shared coordinates, enabling transfer across dexterous hands with different kinematics and DoFs.

UniDex-VLA

UniDex-VLA architecture

UniDex-VLA is a 3D vision-language-action policy that takes pointcloud observations, language instructions, and proprioception, and predicts dexterous action chunks in FAAS.

Tool-Use Tasks

Cut Bags | Wuji Hand

Insert fingers into scissors and cut a chips bag in a human-like grasp. -- 2x speed

Cut Bags | Wuji Hand

The same scissor manipulation policy transfers to another bag variant. -- 2x speed

Use Mouse | Wuji Hand

Place fingers on the mouse, drag the file, and click to finish the desktop task. -- 1x speed

Water Flowers | Wuji Hand

Lift the spray bottle and press the trigger with the thumb to water flowers. -- 1x speed

Make Coffee | Inspire Hand

Grasp the kettle, move to the dripper, and pour water for coffee making. -- 1x speed

Sweep Objects | Inspire Hand

Grasp a sweeper and sweep tabletop objects into the dustpan. -- 2x speed

Generalization

Cross-Hand Transfer | Wuji Hand

A policy trained on Inspire Hand is deployed zero-shot on the higher-DoF Wuji hand. -- 2x speed

Cross-Hand Transfer | Oymotion Hand

The same policy also transfers zero-shot to Oymotion with different kinematics. -- 2x speed

Object Generalization | Inspire Hand

The coffee-making policy transfers to an unseen kettle with different color, size, and geometry. -- 2x speed

UniDex-Cap

UniDex-Cap is a portable human data capture setup that records synchronized RGB-D streams and hand poses, then converts them into robot-executable trajectories through the same transformation pipeline. It can be used to collect human data for human-robot data co-training.

UniDex-Cap setup

BibTeX

@@article{zhang2026unidex,
  title={UniDex: A Robot Foundation Suite for Universal Dexterous Hand Control from Egocentric Human Videos},
  author={Zhang, Gu and Xu, Qicheng and Zhang, Haozhe and Ma, Jianhan and He, Long and Bao, Yiming and Ping, Zeyu and Yuan, Zhecheng and Lu, Chenhao and Yuan, Chengbo and others},
  journal={arXiv preprint arXiv:2603.22264},
  year={2026}
}