license: apache-2.0
task_categories:
- robotics
- keypoint-detection
tags:
- robot-manipulation
- vision-language-models
- zero-shot-generalization
- bridge-v2
PEEK VLM-Labeled BRIDGE_v2 dataset
This dataset contains the LeRobot-format BRIDGE-v2 dataset with paths and masks from the PEEK VLM drawn onto the image: PEEK: Guiding and Minimal Image Representations for Zero-Shot Generalization of Robot Manipulation Policies.
PEEK fine-tunes Vision-Language Models (VLMs) to predict a unified point-based intermediate representation for robot manipulation. This representation consists of:
- End-effector paths: specifying what actions to take.
- Task-relevant masks: indicating where to focus.
These annotations are directly overlaid onto robot observations, making the representation policy-agnostic and transferable across architectures. This dataset provides these automatically generated labels for the BRIDGE_v2 dataset, enabling researchers to readily use them for policy training and enhancement to boost zero-shot generalization.
Paper
PEEK: Guiding and Minimal Image Representations for Zero-Shot Generalization of Robot Manipulation Policies
Project Page
https://peek-robot.github.io
Code/Github Repository
The main PEEK framework and associated code can be found on the Github repository:
https://github.com/peek-robot/peek
Sample Usage
This dataset provides the BRIDGE_v2 dataset labeled with PEEK VLM path and mask labels.
Citation
If you find this dataset useful for your research, please cite the original paper:
@inproceedings{zhang2025peek,
title={PEEK: Guiding and Minimal Image Representations for Zero-Shot Generalization of Robot Manipulation Policies},
author={Jesse Zhang and Marius Memmel and Kevin Kim and Dieter Fox and Jesse Thomason and Fabio Ramos and Erdem BΔ±yΔ±k and Abhishek Gupta and Anqi Li},
booktitle={arXiv:2509.18282},
year={2025},
}