2

KAO: Kernel-Adaptive Optimization in Diffusion for Satellite Image

Satellite image inpainting is a crucial task in remote sensing, where accurately restoring missing or occluded regions is essential for robust image analysis. In this paper, we propose KAO, a novel framework that utilizes Kernel-Adaptive Optimization within diffusion models for satellite image inpainting. KAO is specifically designed to address the challenges posed by very high-resolution (VHR) satellite datasets, such as DeepGlobe and the Massachusetts Roads Dataset. Unlike existing methods that rely on preconditioned models requiring extensive retraining or postconditioned models with significant computational overhead, KAO introduces a Latent Space Conditioning approach, optimizing a compact latent space to achieve efficient and accurate inpainting. Furthermore, we incorporate Explicit Propagation into the diffusion process, facilitating forward-backward fusion, which improves the stability and precision of the method. Experimental results demonstrate that KAO sets a new benchmark for VHR satellite image restoration, providing a scalable, high-performance solution that balances the efficiency of preconditioned models with the flexibility of postconditioned models.

Teerapong Panboonyuen

2025 In IEEE Transactions on Geoscience and Remote Sensing (TGRS, Impact Factor 8.6)

KAO: Kernel-Adaptive Optimization in Diffusion for Satellite Image

GuidedBox: A segmentation-guided box teacher-student approach for weakly supervised road segmentation

Road segmentation in remote sensing is crucial for applications like urban planning, traffic monitoring, and autonomous driving. Labeling objects via pixel-wise segmentation is challenging compared to bounding boxes. Existing weakly supervised segmentation methods often rely on heuristic bounding box priors, but we propose that box-supervised techniques can yield better results. Introducing GuidedBox, an end-to-end framework for weakly supervised instance segmentation. GuidedBox uses a teacher model to generate high-quality pseudo-masks and employs a confidence scoring mechanism to filter out noisy masks. We also introduce a noise-aware pixel loss and affinity loss to optimize the student model with pseudo-masks. Our extensive experiments show that GuidedBox outperforms state-of-the-art methods like SOLOv2, CondInst, and Mask R-CNN on the Massachusetts Roads Dataset, achieving an AP50 score of 0.9231. It also shows strong performance on SpaceNet and DeepGlobe datasets, proving its versatility in remote sensing applications. Code has been made available at https://github.com/kaopanboonyuen/GuidedBox.

Teerapong Panboonyuen, C. Charoenphon, C. Satirapod

2025 In European Journal of Remote Sensing (Taylor & Francis)

GuidedBox: A segmentation-guided box teacher-student approach for weakly supervised road segmentation

SatDiff: A Stable Diffusion Framework for Inpainting Very High-Resolution Satellite Imagery

Satellite image inpainting is a critical task in remote sensing, requiring accurate restoration of missing or occluded regions for reliable image analysis. In this paper, we present SatDiff, an advanced inpainting framework based on diffusion models, specifically designed to tackle the challenges posed by very high-resolution (VHR) satellite datasets such as DeepGlobe and the Massachusetts Roads Dataset. Building on insights from our previous work, SatInPaint, we enhance the approach to achieve even higher recall and overall performance. SatDiff introduces a novel Latent Space Conditioning technique that leverages a compact latent space for efficient and precise inpainting. Additionally, we integrate Explicit Propagation into the diffusion process, enabling forward-backward fusion for improved stability and accuracy. Inspired by encoder-decoder architectures like the Segment Anything Model (SAM), SatDiff is seamlessly adaptable to diverse satellite imagery scenarios. By balancing the efficiency of preconditioned models with the flexibility of postconditioned approaches, SatDiff establishes a new benchmark in VHR satellite datasets, offering a scalable and high-performance solution for satellite image restoration. The code for SatDiff is publicly available at https://github.com/kaopanboonyuen/SatDiff.

Teerapong Panboonyuen, C. Charoenphon, C. Satirapod

2025 In IEEE Access (Supported by the Ratchadapisek Somphot Fund for Postdoctoral Fellowship, 2024–2025)

SatDiff: A Stable Diffusion Framework for Inpainting Very High-Resolution Satellite Imagery

CHULA: Custom Heuristic Uncertainty-Guided Loss for Accurate Land Title Deed Segmentation

Accurately segmenting land boundaries from Thai land title deeds is crucial for reliable land management and legal processes, but remains challenging due to low-quality scans, diverse layouts, and complex overlapping elements in documents. Existing methods often struggle with these difficulties, resulting in imprecise delineations that can cause disputes or inefficiencies. To address these issues, we propose CHULA, a novel Custom Heuristic Uncertainty-guided Loss tailored specifically for robust land title deed segmentation. CHULA uniquely combines domain-specific heuristic priors with uncertainty modeling in a unified loss function that effectively guides the model to focus on clearer regions while refining boundaries and suppressing noisy areas. Evaluated on a carefully curated Thai Land Title Deed Dataset, CHULA achieves an impressive 92.4% accuracy, significantly surpassing standard segmentation baselines. Our results highlight the promise of integrating uncertainty and heuristic knowledge to enhance segmentation accuracy in complex, real-world documents. The code is publicly available at https://github.com/kaopanboonyuen/CHULA.

Teerapong Panboonyuen, C. Charoenphon, C. Satirapod

2025 In IEEE Access (Supported by C2F Postdoctoral Funding, 2025–2026)

CHULA: Custom Heuristic Uncertainty-Guided Loss for Accurate Land Title Deed Segmentation

Investigating the use of deep learning-derived weighted mean temperature for GPS-PWVs estimation

GNSS data offers a reliable alternative for estimating Precipitable Water Vapor (PWV), but accurate GPS-PWV determination in tropical climates requires weighted mean temperature (Tm). With traditional measurement methods often unavailable in Thailand, and existing empirical models showing low accuracy, we propose a deep learning approach. Our Bidirectional Learning with Attention (BLA) model incorporates GRUs and an attention mechanism for Tm modeling. Trained on ERA5 data (2017-2021) and evaluated on 2022 data, BLA-Tm achieved 76% improvement over conventional models, reducing biases significantly. Validation with 280 GNSS stations confirmed BLA-Tm’s superior accuracy in GPS-PWV estimation.

C. Charoenphon, Teerapong Panboonyuen, B. Zhang, C. Satirapod

2025 In Journal of Spatial Science (Taylor & Francis)

Investigating the use of deep learning-derived weighted mean temperature for GPS-PWVs estimation

DOTA: Deformable Optimized Transformer Architecture for End-to-End Text Recognition with Retrieval-Augmented Generation

In this paper, we present a novel end-to-end framework that integrates ResNet and Vision Transformer (ViT) backbones with cutting-edge techniques such as Deformable Convolutions, Retrieval-Augmented Generation, and Conditional Random Fields (CRF). These innovations work together to significantly improve feature representation and Optical Character Recognition (OCR) performance. By replacing the standard convolution layers in the third and fourth blocks with Deformable Convolutions, the framework adapts more flexibly to complex text layouts, while adaptive dropout helps prevent overfitting and enhance generalization. Moreover, incorporating CRFs refines the sequence modeling for more accurate text recognition. Extensive experiments on six benchmark datasets—IC13, IC15, SVT, IIIT5K, SVTP, and CUTE80—demonstrate the framework’s exceptional performance. Our method represents a significant leap forward in OCR technology, addressing challenges in recognizing text with various distortions, fonts, and orientations. The framework has proven not only effective in controlled conditions but also adaptable to more complex, real-world scenarios. The code for this framework is available at https://github.com/kaopanboonyuen/DOTA.

N. Nithisopa, Teerapong Panboonyuen

2025 In 17th International Conference on Knowledge and Smart Technology (KST2025)

DOTA: Deformable Optimized Transformer Architecture for End-to-End Text Recognition with Retrieval-Augmented Generation

MeViT: A Medium-Resolution Vision Transformer for Semantic Segmentation on Landsat Satellite Imagery for Agriculture in Thailand

In this paper, we present MeViT (Medium-Resolution Vision Transformer), designed for semantic segmentation of Landsat satellite imagery, focusing on key economic crops in Thailand para rubber, corn, and pineapple. MeViT enhances Vision Transformers (ViTs) by integrating medium-resolution multi-branch architectures and revising mixed-scale convolutional feedforward networks (MixCFN) to extract multi-scale local information. Extensive experiments on a public Thailand dataset demonstrate that MeViT outperforms state-of-the-art deep learning methods, achieving a precision of 92.22%, recall of 94.69%, F1 score of 93.44%, and mean IoU of 83.63%. These results highlight MeViT’s effectiveness in accurately segmenting Thai Landsat-8 data.

Teerapong Panboonyuen, C. Charoenphon, C. Satirapod

2023 In Remote Sensing (Supported by Ratchadapisek Somphot Postdoctoral Fund, 2022–2024)

MeViT: A Medium-Resolution Vision Transformer for Semantic Segmentation on Landsat Satellite Imagery for Agriculture in Thailand

Object Detection of Road Assets Using Transformer-Based YOLOX with Feature Pyramid Decoder on Thai Highway Panorama

Detecting objects of varying sizes, like kilometer stones, remains a significant challenge and directly affects the accuracy of object counts. Transformers have shown remarkable success in natural language processing (NLP) and image processing due to their ability to model long-range dependencies. This paper proposes an enhanced YOLO (You Only Look Once) series with two key contributions, (i) We employ a pre-training objective to obtain original visual tokens from image patches of road assets, using a pre-trained Vision Transformer (ViT) backbone, which is then fine-tuned on downstream tasks with additional task layers. (ii) We incorporate Feature Pyramid Network (FPN) decoder designs into our deep learning network to learn the significance of different input features, avoiding issues like feature mismatch and performance degradation that arise from simple summation or concatenation. Our proposed method, Transformer-Based YOLOX with FPN, effectively learns general representations of objects and significantly outperforms state-of-the-art detectors, including YOLOv5S, YOLOv5M, and YOLOv5L. It achieves a 61.5% AP on the Thailand highway corpus, surpassing the current best practice (YOLOv5L) by 2.56% AP on the test-dev dataset.

Teerapong Panboonyuen, C. Charoenphon

2022 In Information

Object Detection of Road Assets Using Transformer-Based YOLOX with Feature Pyramid Decoder on Thai Highway Panorama

A Performance Comparison between GIS-based and Neuron Network Methods for Flood Susceptibility Assessment in Ayutthaya Province

Flooding poses a significant challenge in Thailand due to its complex geography, traditionally addressed through GIS methods like the Flood Risk Assessment Model (FRAM) combined with the Analytical Hierarchy Process (AHP). This study assesses the efficacy of Artificial Neural Networks (ANN) in flood susceptibility mapping, using data from Ayutthaya Province and incorporating 5-fold cross-validation and Stochastic Gradient Descent (SGD) for training. ANN achieved superior performance with precision of 79.90%, recall of 79.04%, F1-score of 79.08%, and accuracy of 79.31%, outperforming the traditional FRAM approach. Notably, ANN identified that only three factors—flow accumulation, elevation, and soil types—were crucial for predicting flood-prone areas. This highlights the potential for ANN to simplify and enhance flood risk assessments. Moreover, the integration of advanced machine learning techniques underscores the evolving capability of AI in addressing complex environmental challenges.

T. Vajeethaveesin, Teerapong Panboonyuen

2022 In Trends in Sciences (Trends Sci. or TiS)

A Performance Comparison between GIS-based and Neuron Network Methods for Flood Susceptibility Assessment in Ayutthaya Province

Enhanced Feature Pyramid Vision Transformer for Semantic Segmentation on Thailand Landsat-8 Corpus

Semantic segmentation on Landsat-8 data is crucial in the integration of diverse data, allowing researchers to achieve more productivity and lower expenses. This research aimed to improve the versatile backbone for dense prediction without convolutions—namely, using the pyramid vision transformer (PRM-VS-TM) to incorporate attention mechanisms across various feature maps. Furthermore, the PRM-VS-TM constructs an end-to-end object detection system without convolutions and uses handcrafted components, such as dense anchors and non-maximum suspension (NMS). The present study was conducted on a private dataset, i.e., the Thailand Landsat-8 challenge. There are three baselines, DeepLab, Swin Transformer (Swin TF), and PRM-VS-TM. Results indicate that the proposed model significantly outperforms all current baselines on the Thailand Landsat-8 corpus, providing F1-scores greater than 80% in almost all categories. Finally, we demonstrate that our model, without utilizing pre-trained settings or any further post-processing, can outperform current state-of-the-art (SOTA) methods for both agriculture and forest classes.

Teerapong Panboonyuen, P. Rakwatin, K. Intarat

2020 In Information

Enhanced Feature Pyramid Vision Transformer for Semantic Segmentation on Thailand Landsat-8 Corpus