Learning Precise, Contact-Rich Manipulation through Uncalibrated Tactile Skins



Abstract

Visuo-motor policy learning has advanced robotic manipulation, but mastering precise, contact-rich tasks remains challenging due to vision's limitations in reasoning about contacts.

To solve this, several efforts have been made to integrate tactile sensors into policy learning. However, many of these efforts rely on optical tactile sensors that are either confined to recognition tasks or require complex dimensionality reduction steps for policy learning. This work looks at learning policies with magnetic skin sensors as they are natively low-dimensional, highly sensitive, and cheap to integrate on robotic platforms.

To do this effectively, we present VISK, a simple framework that uses a transformer-based policy and treats skin sensor data as additional tokens to vision-based information. Evaluated across four complex real-world tasks (credit card swiping, plug insertion, USB insertion, and bookshelf retrieval), VISK significantly outperforms vision-only and prior tactile models. Further analysis reveals that combining tactile and visual modalities enhances policy performance and spatial generalization, achieving an average improvement of 27.5% across tasks.

ViSk Figure 1
ViSk Figure 2

Policy Learning for 4 Precise Tasks

The following videos are learnt ViSk policy rollouts being executed on the robot at 1x speed. We run 10 evaluations at unseen positions, across 3 seeds, and report the results in the next section.

Plug Insertion

USB Insertion

Card Swiping

Book Retrieval

Experiment Results

Policy Performance out of 30 rollouts
Comparison across sensors