1

Seeing Isn't Always Believing: Analysis of Grad-CAM Faithfulness and Localization Reliability in Lung Cancer CT Classification
This study provides a rigorous and model-aware examination of the faithfulness and spatial reliability of Grad-CAM explanations for lung cancer CT image classification across both convolutional neural networks and Vision Transformer architectures. By systematically analyzing localization accuracy, perturbation-based faithfulness, and explanation consistency, the work reveals pronounced architecture-dependent disparities in how visual explanations align with true diagnostic evidence. While Grad-CAM often produces visually convincing heatmaps for convolutional models, these explanations can be spatially coarse or influenced by spurious correlations, raising concerns about shortcut learning and misleading interpretability. More critically, the analysis demonstrates that transformer-based models, despite strong predictive performance, exhibit a marked degradation in Grad-CAM reliability due to non-local attention mechanisms. Together, these findings underscore a central message, visually appealing explanations do not necessarily imply faithful model reasoning. The work highlights fundamental limitations of saliency-based XAI methods in high-stakes medical imaging and calls for more principled, model-aware interpretability approaches that can support genuinely trustworthy and clinically meaningful AI systems.
SEA-ViT: Sea Surface Currents Forecasting Using Vision Transformer and GRU-Based Spatio-Temporal Covariance Modeling
Forecasting sea surface currents is essential for applications such as maritime navigation, environmental monitoring, and climate analysis, particularly in regions like the Gulf of Thailand and the Andaman Sea. This paper introduces SEA-ViT, an advanced deep learning model that integrates Vision Transformer (ViT) with bidirectional Gated Recurrent Units (GRUs) to capture spatio-temporal covariance for predicting sea surface currents (U, V) using high-frequency radar (HF) data. The name SEA-ViT is derived from Sea Surface Currents Forecasting using Vision Transformer, highlighting the model’s emphasis on ocean dynamics and its use of the ViT architecture to enhance forecasting capabilities. SEA-ViT is designed to unravel complex dependencies by leveraging a rich dataset spanning over 30 years and incorporating ENSO indices (El Niño, La Niña, and neutral phases) to address the intricate relationship between geographic coordinates and climatic variations. This development enhances the predictive capabilities for sea surface currents, supporting the efforts of the Geo-Informatics and Space Technology Development Agency (GISTDA) in Thailand’s maritime regions. The code and pretrained models are available at https://github.com/kaopanboonyuen/gistda-ai-sea-surface-currents.
Real-Time Polyps Segmentation for Colonoscopy Video Frames Using Compressed Fully Convolutional Network
Colorectal cancer is one of the leading causes of cancer death worldwide. As of now, colonoscopy is the most effective screening tool for diagnosing colorectal cancer by searching for polyps which can develop into colon cancer. The drawback of manual colonoscopy process is its high polyp miss rate. Therefore, polyp detection is a crucial issue in the development of colonoscopy application. Despite having high evaluation scores, the recently published methods based on fully convolutional network (FCN) require a very long inferring (testing) time that cannot be applied in a real clinical process due to a large number of parameters in the network. In this paper, we proposed a compressed fully convolutional network by modifying the FCN-8s network, so our network is able to detect and segment polyp from video images within a real-time constraint in a practical screening routine. Furthermore, our customized loss function allows our network to be more robust when compared to the traditional cross-entropy loss function. The experiment was conducted on CVC-EndoSceneStill database which consists of 912 video frames from 36 patients. Our proposed framework has obtained state-of-the-art results while running more than 7 times faster and requiring fewer weight parameters by more than 9 times. The experimental results convey that our system has the potential to support clinicians during the analysis of colonoscopy video by automatically indicating the suspicious polyps locations.