Global Convolutional Network

Transformer-Based Decoder Designs for Semantic Segmentation on Remotely Sensed Images

Transformers have demonstrated remarkable accomplishments in several natural language processing (NLP) tasks as well as image processing tasks. Herein, we present a deep-learning (DL) model that is capable of improving the semantic segmentation network in two ways. First, utilizing the pre-training Swin Transformer (SwinTF) under Vision Transformer (ViT) as a backbone, the model weights downstream tasks by joining task layers upon the pretrained encoder. Secondly, decoder designs are applied to our DL network with three decoder designs, U-Net, pyramid scene parsing (PSP) network, and feature pyramid network (FPN), to perform pixel-level segmentation. The results are compared with other image labeling state of the art (SOTA) methods, such as global convolutional network (GCN) and ViT. Extensive experiments show that our Swin Transformer (SwinTF) with decoder designs reached a new state of the art on the Thailand Isan Landsat-8 corpus (89.8% 𝐹1 score), Thailand North Landsat-8 corpus (63.12% 𝐹1 score), and competitive results on ISPRS Vaihingen. Moreover, both our best-proposed methods (SwinTF-PSP and SwinTF-FPN) even outperformed SwinTF with supervised pre-training ViT on the ImageNet-1K in the Thailand, Landsat-8, and ISPRS Vaihingen corpora.

Teerapong Panboonyuen, P. Vateekul, S. Lawawirojwong

2020 In Remote Sesning (Supported by Ratchadapisek Somphot Postdoctoral Fund, 2021–2022)

Transformer-Based Decoder Designs for Semantic Segmentation on Remotely Sensed Images

Semantic Segmentation on Remotely Sensed Images Using Deep Convolutional Encoder-Decoder Neural Network

My PhD thesis focuses on improving semantic segmentation of aerial and satellite images, a crucial task for applications like agriculture planning, map updates, route optimization, and navigation. Current models like the Deep Convolutional Encoder-Decoder (DCED) have limitations in accuracy due to their inability to recover low-level features and the scarcity of training data. To address these issues, I propose a new architecture with five key enhancements, a Global Convolutional Network (GCN) for improved feature extraction, channel attention for selecting discriminative features, domain-specific transfer learning to address data scarcity, Feature Fusion (FF) for capturing low-level details, and Depthwise Atrous Convolution (DA) for refining features. Experiments on Landsat-8 datasets and the ISPRS Vaihingen benchmark showed that my proposed architecture significantly outperforms the baseline models in remote sensing imagery.

Teerapong Panboonyuen

2020 In Chulalongkorn University Thesis Evaluation - Very Good Score (Outstanding Achievement)

Semantic Segmentation on Remotely Sensed Images Using Deep Convolutional Encoder-Decoder Neural Network

Semantic Segmentation on Remotely Sensed Images Using an Enhanced Global Convolutional Network with Channel Attention and Domain Specific Transfer Learning

In the remote sensing domain, it is crucial to complete semantic segmentation on the raster images, e.g., river, building, forest, etc., on raster images. A deep convolutional encoder–decoder (DCED) network is the state-of-the-art semantic segmentation method for remotely sensed images. However, the accuracy is still limited, since the network is not designed for remotely sensed images and the training data in this domain is deficient. In this paper, we aim to propose a novel CNN for semantic segmentation particularly for remote sensing corpora with three main contributions. First, we propose applying a recent CNN called a global convolutional network (GCN), since it can capture different resolutions by extracting multi-scale features from different stages of the network. Additionally, we further enhance the network by improving its backbone using larger numbers of layers, which is suitable for medium resolution remotely sensed images. Second, “channel attention” is presented in our network in order to select the most discriminative filters (features). Third, “domain-specific transfer learning” is introduced to alleviate the scarcity issue by utilizing other remotely sensed corpora with different resolutions as pre-trained data. The experiment was then conducted on two given datasets (i) medium resolution data collected from Landsat-8 satellite and (ii) very high resolution data called the ISPRS Vaihingen Challenge Dataset. The results show that our networks outperformed DCED in terms of 𝐹1 for 17.48% and 2.49% on medium and very high resolution corpora, respectively.

Teerapong Panboonyuen, P. Vateekul, S. Lawawirojwong

2019 In Remote Sesning

Semantic Segmentation on Remotely Sensed Images Using an Enhanced Global Convolutional Network with Channel Attention and Domain Specific Transfer Learning