Abstract

Teaser image

Current state-of-the-art methods for panoptic segmentation require an immense amount of annotated training data that is both arduous and expensive to obtain posing a significant challenge for their widespread adoption. Concurrently, recent breakthroughs in visual representation learning have sparked a paradigm shift leading to the advent of large foundation models that can be trained with completely unlabeled images. In this work, we propose to leverage such task-agnostic image features to enable few-shot panoptic segmentation by presenting Segmenting Panoptic Information with Nearly 0 labels (SPINO). In detail, our method combines a DINOv2 backbone with lightweight network heads for semantic segmentation and boundary estimation. We show that our approach, albeit being trained with only ten annotated images, predicts high-quality pseudo-labels that can be used with any existing panoptic segmentation method. Notably, we demonstrate that SPINO achieves competitive results compared to fully supervised baselines while using less than 0.3% of the ground truth labels, paving the way for learning complex visual recognition tasks leveraging foundation models. To illustrate its general applicability, we further deploy SPINO on real-world robotic vision systems for both outdoor and indoor environments.

Technical Approach

Overview of our approach

Overview of our proposed SPINO approach for few-shot panoptic segmentation. SPINO consists of two learning-based modules for semantic segmentation and boundary estimation that leverage features from the recent foundation model DINOv2. A panoptic fusion scheme combines their outputs using connected component analysis (CCA) and multiple small instance filtering steps. SPINO creates pseudo-labels for a great number of unlabeled images using only k ≈ 10 images with ground truth annotations. These pseudo-labels can then be utilized to train any panoptic segmentation model.

Video

Code

A software implementation of this project based on PyTorch can be found in our GitHub repository for academic usage and is released under the GPLv3 license. For any commercial purpose, please contact the authors.

Publications

If you find our work useful, please consider citing our paper:

Markus Käppeler, Kürsat Petek, Niclas Vödisch, Wolfram Burgard, and Abhinav Valada
Few-Shot Panoptic Segmentation With Foundation Models
arXiv preprint arXiv:2309.10726, 2023.

(PDF) (BibTeX)

Authors

Markus Käppeler

Markus Käppeler

University of Freiburg

Kürsat Petek

Kürsat Petek

University of Freiburg

Niclas Vödisch

Niclas Vödisch

University of Freiburg

Wolfram Burgard

Wolfram Burgard

University of Technology Nuremberg

Abhinav Valada

Abhinav Valada

University of Freiburg

Acknowledgment

This work was funded by the German Research Foundation (DFG) Emmy Noether Program grant No 468878300 and the European Union’s Horizon 2020 research and innovation program grant No 871449-OpenDR.