My research focuses on Learning Representations—developing cutting-edge algorithms with optimization theory to push AI’s limits. I work with advanced models like GANs and Diffusion Models, leverage Self-Supervised Learning, explore how Adversarial Attacks on Large Language Models (LLMs) could reshape the future of AI.
I am currently a Senior AI Research Scientist at MARS (Motor AI Recognition Solution) and a Postdoctoral Fellow at Chulalongkorn University. I earned my Ph.D. in Computer Engineering from Chulalongkorn University, where I specialized in AI.
Passionate about Cognitive Intelligence and unlocking human potential, I’m also deeply immersed in Geospatial Intelligence, where LLMs uncover groundbreaking insights that reshape how we understand and interact with our world.
Detailed summaries of my academic, industry, and teaching experience can be found in my CV, and get a glimpse into my personal life on my blog and tumblr. Also, feel free to vibe to my music on SoundCloud.
Thai name: ธีรพงศ์ ปานบุญยืน, aka Kao Panboonyuen, or just Kao (เก้า).
PostDoc Fellow in AI, 2025
Chulalongkorn University
PhD in Computer Engineering, 2020
Chulalongkorn University
MEng in Computer Engineering, 2017
Chulalongkorn University
BEng in Computer Engineering, 2015
KMUTNB (Top 1% in University Mathematics)
Pre-Engineering School (PET21), 2012
KMUTNB (Senior High School, 10th - 12th Grade)
Reviewer for International Journals/Conferences:
Forecasting sea surface currents is essential for applications such as maritime navigation, environmental monitoring, and climate analysis, particularly in regions like the Gulf of Thailand and the Andaman Sea. This paper introduces SEA-ViT, an advanced deep learning model that integrates Vision Transformer (ViT) with bidirectional Gated Recurrent Units (GRUs) to capture spatio-temporal covariance for predicting sea surface currents (U, V) using high-frequency radar (HF) data. The name SEA-ViT is derived from Sea Surface Currents Forecasting using Vision Transformer, highlighting the model’s emphasis on ocean dynamics and its use of the ViT architecture to enhance forecasting capabilities. SEA-ViT is designed to unravel complex dependencies by leveraging a rich dataset spanning over 30 years and incorporating ENSO indices (El Niño, La Niña, and neutral phases) to address the intricate relationship between geographic coordinates and climatic variations. This development enhances the predictive capabilities for sea surface currents, supporting the efforts of the Geo-Informatics and Space Technology Development Agency (GISTDA) in Thailand’s maritime regions. The code and pretrained models are available at https://github.com/kaopanboonyuen/gistda-ai-sea-surface-currents.
This paper dives into the cutting-edge world of road asset detection on Thai highways, showcasing a novel approach that combines an upgraded REG model with Generalized Focal Loss. Our focus is on identifying key road elements—like pavilions, pedestrian bridges, information and warning signs, and concrete guardrails—to boost road safety and infrastructure management. While deep learning methods have shown promise, traditional models often struggle with accuracy in tricky conditions, such as cluttered backgrounds and variable lighting. To tackle these issues, we’ve integrated REG with Generalized Focal Loss, enhancing its ability to detect road assets with greater precision. Our results are impressive, the REGx model led the way with a mAP50 of 80.340, mAP50-95 of 60.840, precision of 79.100, recall of 76.680, and an F1-score of 77.870. These findings highlight the REGx model’s superior performance, demonstrating the power of advanced deep learning techniques to improve highway safety and infrastructure maintenance, even in challenging conditions.
In this paper, we present MeViT (Medium-Resolution Vision Transformer), designed for semantic segmentation of Landsat satellite imagery, focusing on key economic crops in Thailand para rubber, corn, and pineapple. MeViT enhances Vision Transformers (ViTs) by integrating medium-resolution multi-branch architectures and revising mixed-scale convolutional feedforward networks (MixCFN) to extract multi-scale local information. Extensive experiments on a public Thailand dataset demonstrate that MeViT outperforms state-of-the-art deep learning methods, achieving a precision of 92.22%, recall of 94.69%, F1 score of 93.44%, and mean IoU of 83.63%. These results highlight MeViT’s effectiveness in accurately segmenting Thai Landsat-8 data.
Evaluating car damages is crucial for the car insurance industry, but current deep learning networks fall short in accuracy due to inadequacies in handling car damage images and producing fine segmentation masks. This paper introduces MARS (Mask Attention Refinement with Sequential quadtree nodes) for instance segmentation of car damages. MARS employs self-attention mechanisms to capture global dependencies within sequential quadtree nodes and a quadtree transformer to recalibrate channel weights, resulting in highly accurate instance masks. Extensive experiments show that MARS significantly outperforms state-of-the-art methods like Mask R-CNN, PointRend, and Mask Transfiner on three popular benchmarks, achieving a +1.3 maskAP improvement with the R50-FPN backbone and +2.3 maskAP with the R101-FPN backbone on the Thai car-damage dataset. Demos are available at https://github.com/kaopanboonyuen/MARS.
Flooding poses a significant challenge in Thailand due to its complex geography, traditionally addressed through GIS methods like the Flood Risk Assessment Model (FRAM) combined with the Analytical Hierarchy Process (AHP). This study assesses the efficacy of Artificial Neural Networks (ANN) in flood susceptibility mapping, using data from Ayutthaya Province and incorporating 5-fold cross-validation and Stochastic Gradient Descent (SGD) for training. ANN achieved superior performance with precision of 79.90%, recall of 79.04%, F1-score of 79.08%, and accuracy of 79.31%, outperforming the traditional FRAM approach. Notably, ANN identified that only three factors—flow accumulation, elevation, and soil types—were crucial for predicting flood-prone areas. This highlights the potential for ANN to simplify and enhance flood risk assessments. Moreover, the integration of advanced machine learning techniques underscores the evolving capability of AI in addressing complex environmental challenges.
My PhD thesis focuses on improving semantic segmentation of aerial and satellite images, a crucial task for applications like agriculture planning, map updates, route optimization, and navigation. Current models like the Deep Convolutional Encoder-Decoder (DCED) have limitations in accuracy due to their inability to recover low-level features and the scarcity of training data. To address these issues, I propose a new architecture with five key enhancements, a Global Convolutional Network (GCN) for improved feature extraction, channel attention for selecting discriminative features, domain-specific transfer learning to address data scarcity, Feature Fusion (FF) for capturing low-level details, and Depthwise Atrous Convolution (DA) for refining features. Experiments on Landsat-8 datasets and the ISPRS Vaihingen benchmark showed that my proposed architecture significantly outperforms the baseline models in remote sensing imagery.
Colorectal cancer is one of the leading causes of cancer death worldwide. As of now, colonoscopy is the most effective screening tool for diagnosing colorectal cancer by searching for polyps which can develop into colon cancer. The drawback of manual colonoscopy process is its high polyp miss rate. Therefore, polyp detection is a crucial issue in the development of colonoscopy application. Despite having high evaluation scores, the recently published methods based on fully convolutional network (FCN) require a very long inferring (testing) time that cannot be applied in a real clinical process due to a large number of parameters in the network. In this paper, we proposed a compressed fully convolutional network by modifying the FCN-8s network, so our network is able to detect and segment polyp from video images within a real-time constraint in a practical screening routine. Furthermore, our customized loss function allows our network to be more robust when compared to the traditional cross-entropy loss function. The experiment was conducted on CVC-EndoSceneStill database which consists of 912 video frames from 36 patients. Our proposed framework has obtained state-of-the-art results while running more than 7 times faster and requiring fewer weight parameters by more than 9 times. The experimental results convey that our system has the potential to support clinicians during the analysis of colonoscopy video by automatically indicating the suspicious polyps locations.
Semantic segmentation of remotely-sensed aerial (or very-high resolution, VHS) images and satellite (or high-resolution, HR) images has numerous application domains, particularly in road extraction, where the segmented objects serve as essential layers in geospatial databases. Despite several efforts to use deep convolutional neural networks (DCNNs) for road extraction from remote sensing images, accuracy remains a challenge. This paper introduces an enhanced DCNN framework specifically designed for road extraction from remote sensing images by incorporating landscape metrics (LMs) and conditional random fields (CRFs). Our framework employs the exponential linear unit (ELU) activation function to improve the DCNN, leading to a higher quantity and more accurate road extraction. Additionally, to minimize false classifications of road objects, we propose a solution based on the integration of LMs. To further refine the extracted roads, a CRF method is incorporated into our framework. Experiments conducted on Massachusetts road aerial imagery and Thailand Earth Observation System (THEOS) satellite imagery datasets demonstrated that our proposed framework outperforms SegNet, a state-of-the-art object segmentation technique, in most cases regarding precision, recall, and F1 score across various types of remote sensing imagery.
In this paper, we introduce an improved deep convolutional encoder-decoder network (DCED) for segmenting road objects from aerial images. Enhancements include the use of ELU (exponential linear unit) instead of ReLU, dataset augmentation with incrementally-rotated images to increase training data by eight times, and the use of landscape metrics to remove false road objects. Tested on the Massachusetts Roads dataset, our method outperformed the SegNet benchmark and other baselines in precision, recall, and F1 scores.
To find relevant content, try searching publications, filtering using the buttons below, or exploring popular topics. A * denotes equal contribution.
*Visiting Faculty - College of Computing, Khon Kaen University
Guest Lecturer and AI Committee Member
NSTDA One Day Camp at Sirindhorn Science Home
2108421 Modern Integrated Survey Technology (MIST) - Chulalongkorn University
CP411701 AI Inspiration Course - Khon Kaen University
The 7th KVIS Invitational Science Fair
Industrial Advisory Board (IAB) - ECE KMUTNB
AI and ML Instructor - Nomklao Kunnathi Demonstration School
Deep Learning Instructor - Thammasat University
Senior Project Advisor - Thammasat University
AI Instructor - Department of Lands, Thailand
Flood Risk Assessment in Ayutthaya Province
The Bangkok Urbanscapes Dataset for Semantic Urban Scene Understanding Using Deep Learning