GuidedBox: A segmentation-guided box teacher-student approach for weakly supervised road segmentation

Abstract

The importance of road segmentation in remote sensing data cannot be overstated, as it underpins various critical applications such as urban planning, traffic moni- toring, and autonomous driving systems. Automatically labeling objects with pixel- wise segmentation is a labor-intensive task, especially when compared to bound- ing boxes. Many existing weakly supervised instance segmentation methods rely on heuristic losses derived from bounding box priors. However, we hypothesize that box-supervised techniques may yield high-quality segmentation masks, prompting us to investigate whether detectors can effectively learn from these masks while dis- regarding those of low quality. To address this inquiry, we introduce GuidedBox, an end-to-end training framework tailored for robust, weakly supervised instance segmentation. GuidedBox employs a sophisticated teacher model to generate pre- cise masks as pseudo-labels. Recognizing the potentially detrimental effects of noisy masks on training, we propose a mask-aware confidence scoring mechanism to as- sess the quality of pseudo-masks. Additionally, we introduce noise-aware pixel loss and noise-reduced affinity loss functions to optimize the student model with pseudo masks dynamically. Extensive experimentation demonstrates the efficacy of Guided- Box across multiple datasets. Notably, GuidedBox outperforms existing state-of-the- art methods such as SOLOv2, CondInst, and Mask R-CNN on the challenging Mas- sachusetts Roads Dataset, achieving an AP50 score of 0.9231. Furthermore, Guid- edBox shows competitive performance on the SpaceNet and DeepGlobe datasets, highlighting its robustness and adaptability across different remote sensing scenarios. Code has been made available at https://github.com/kaopanboonyuen/GuidedBox.

Publication
In European Journal of Remote Sensing (Taylor & Francis)

Teerapong Panboonyuen
Teerapong Panboonyuen

My research focuses on leveraging advanced machine intelligence techniques, specifically computer vision, to enhance semantic understanding, learning representations, visual recognition, and geospatial data interpretation.

Related