Transformer-Based Decoder Designs

Semantic Segmentation on Remotely Sensed Imagery

Teerapong Panboonyuen

Overview

This work introduces novel decoder designs within the Transformer-based Swin architecture, tailored for semantic segmentation tasks in high-resolution remote sensing images. By enhancing global contextual understanding and preserving fine-grained spatial detail, our methods outperform conventional CNN-based decoders on multiple aerial benchmarks.

Architectures

Architecture Diagram 1 Architecture Diagram 2

Pretrained Checkpoints

Dataset Structure

corpus_name/
โ”œโ”€โ”€ train/
โ”œโ”€โ”€ train_labels/
โ”œโ”€โ”€ val/
โ”œโ”€โ”€ val_labels/
โ”œโ”€โ”€ test/
โ”œโ”€โ”€ test_labels/
      

Include our_class_dict.csv to map class names to RGB colors.

name,r,g,b
Agriculture,255,255,155
Forest,56,168,0
Urban,255,0,0
Water,0,122,255
Miscellaneous,183,140,31
      

Sample Results

Result Isan Result North Result ISPRS Vaihingen

Installation & Usage

# Install dependencies
pip install tensorflow opencv-python

# Train model
python train.py --dataset corpus_name --model swin_decoder

# Test model
python test.py --dataset corpus_name
      
TensorFlow GPU Setup

Related Publications

Citation

@article{panboonyuen2025transformer,
  title={Transformer-Based Decoder Designs for Semantic Segmentation on Remotely Sensed Images},
  author={Panboonyuen, Teerapong},
  journal={Remote Sensing Letters},
  year={2025},
  note={Under review}
}