bkkurbanscapes

The Bangkok Urbanscapes Dataset for Semantic Urban Scene Understanding Using Enhanced Encoder-Decoder with Atrous Depthwise Separable A1 Convolutional Neural Networks

(IEEE Access’22, Accepted!!)

KITSAPHON THITISIRIWECH, TEERAPONG PANBOONYUEN, PITTIPOL KANTAVAT, YUJI IWAHORI, BOONSERM KIJSIRIKUL

Paper Link Project Page

Abstract: Semantic segmentation is one of the computer vision tasks which is widely researched at present. It plays an essential role to adapt and apply for real-world use-cases, including the application with autonomous driving systems. To further study self-driving cars in Thailand, we provide both the proposed methods and the proposed dataset in this paper. In the proposed method, we contribute Deeplab-V3-A1 with Xception, which is an extension of DeepLab-V3+ architecture. Our proposed method as DeepLab-V3-A1 with Xception is enhanced by the different number of 1×1 convolution layers on the decoder side and refining the image classification backbone with modification of the Xception model. The experiment was conducted on four datasets: the proposed dataset and three public datasets i.e., the CamVid, the cityscapes, and IDD datasets, respectively. The results show that our proposed strategy as DeepLab-V3-A1 with Xception performs comparably to the baseline methods for all corpora including measurement units such as mean IoU, F1 score, Precision, and Recall. In addition, we benchmark DeepLab-V3-A1 with Xception on the validation set of the cityscapes dataset with a mean IoU of 78.86%. For our proposed dataset, we first contribute the Bangkok Urbanscapes dataset, the urban scenes in Southeast Asia. This dataset contains the pair of input images and annotated labels for 701 images. Our dataset consists of various driving environments in Bangkok, as shown for eleven semantic classes (Road, Building, Tree, Car, Footpath, Motorcycle, Pole, Person, Trash, Crosswalk, and Misc). We hope that our architecture and our dataset would help self-driving autonomous developers improve systems for driving in many cities with unique traffic and driving conditions similar to Bangkok and elsewhere in Thailand. Our implementation codes and dataset are available at https://kaopanboonyuen.github.io/bkkurbanscapes.

Bangkok Urbanscapes Data Set

The resolutions of all the images within our Bangkok urbanscapes dataset are configurated at 521 × 544 pixels.

Download

If you’re going to use this dataset, please cite the tech report at the bottom of this page.

Usage & Data

Refer to requirements.txt for installing all python dependencies. We use python 3.7 with pytorch 1.7.1.

We download the official version of CityScapes from here and images are resized using code here.

Model Training

For pre-training on models on the CityScapes dataset, use the scripts in the scripts directory as follows. Change the paths to dataset as required.

./scripts/train.sh

Downstream Evaluation

Scripts to perform evaluation (linear or knn) on selected downstream tasks are as below. Paths to datasets and pre-trained models must be set appropriately. Note that in the case of linear evaluation, a linear layer will be fine-tuned on the new dataset and this training can be time-consuming on a single GPU.

./scripts/eval_linear.sh
./scripts/eval_knn.sh

Pretrained Models

Our pre-trained models can be found under releases.

Results

Citation

@article{thitisiriwech2022bangkok,
  title={The Bangkok Urbanscapes Dataset for Semantic Urban Scene Understanding Using Enhanced Encoder-Decoder with Atrous Depthwise Separable A1 Convolutional Neural Networks},
  author={Thitisiriwech, Kitsaphon and Panboonyuen, Teerapong and Kantavat, Pittipol and Iwahori, Yuji and Kijsirikul, Boonserm},
  journal={IEEE Access},
  year={2022},
  publisher={IEEE}
}

Acknowledgements

Our code is based on TensorFLow and SegmentationModels repositories. We thank the authors for releasing their code. If you use our model, please consider citing these works as well.