1

SEA-ViT: Sea Surface Currents Forecasting Using Vision Transformer and GRU-Based Spatio-Temporal Covariance Modeling
Forecasting sea surface currents is essential for applications such as maritime navigation, environmental monitoring, and climate analysis, particularly in regions like the Gulf of Thailand and the Andaman Sea. This paper introduces SEA-ViT, an advanced deep learning model that integrates Vision Transformer (ViT) with bidirectional Gated Recurrent Units (GRUs) to capture spatio-temporal covariance for predicting sea surface currents (U, V) using high-frequency radar (HF) data. The name SEA-ViT is derived from Sea Surface Currents Forecasting using Vision Transformer, highlighting the model’s emphasis on ocean dynamics and its use of the ViT architecture to enhance forecasting capabilities. SEA-ViT is designed to unravel complex dependencies by leveraging a rich dataset spanning over 30 years and incorporating ENSO indices (El Niño, La Niña, and neutral phases) to address the intricate relationship between geographic coordinates and climatic variations. This development enhances the predictive capabilities for sea surface currents, supporting the efforts of the Geo-Informatics and Space Technology Development Agency (GISTDA) in Thailand’s maritime regions. The code and pretrained models are available at https://github.com/kaopanboonyuen/gistda-ai-sea-surface-currents.
Real-Time Polyps Segmentation for Colonoscopy Video Frames Using Compressed Fully Convolutional Network
Colorectal cancer is one of the leading causes of cancer death worldwide. As of now, colonoscopy is the most effective screening tool for diagnosing colorectal cancer by searching for polyps which can develop into colon cancer. The drawback of manual colonoscopy process is its high polyp miss rate. Therefore, polyp detection is a crucial issue in the development of colonoscopy application. Despite having high evaluation scores, the recently published methods based on fully convolutional network (FCN) require a very long inferring (testing) time that cannot be applied in a real clinical process due to a large number of parameters in the network. In this paper, we proposed a compressed fully convolutional network by modifying the FCN-8s network, so our network is able to detect and segment polyp from video images within a real-time constraint in a practical screening routine. Furthermore, our customized loss function allows our network to be more robust when compared to the traditional cross-entropy loss function. The experiment was conducted on CVC-EndoSceneStill database which consists of 912 video frames from 36 patients. Our proposed framework has obtained state-of-the-art results while running more than 7 times faster and requiring fewer weight parameters by more than 9 times. The experimental results convey that our system has the potential to support clinicians during the analysis of colonoscopy video by automatically indicating the suspicious polyps locations.
Semantic Segmentation On Medium-Resolution Satellite Images Using Deep Convolutional Networks With Remote Sensing Derived Indices
Semantic Segmentation is a fundamental task in computer vision and remote sensing imagery. Many applications, such as urban planning, change detection, and environmental monitoring, require the accurate segmentation; hence, most segmentation tasks are performed by humans. Currently, with the growth of Deep Convolutional Neural Network (DCNN), there are many works aiming to find the best network architecture fitting for this task. However, all of the studies are based on very-high resolution satellite images, and surprisingly; none of them are implemented on medium resolution satellite images. Moreover, no research has applied geoinformatics knowledge. Therefore, we purpose to compare the semantic segmentation models, which are FCN, SegNet, and GSN using medium resolution images from Landsat-8 satellite. In addition, we propose a modified SegNet model that can be used with remote sensing derived indices. The results show that the model that achieves the highest accuracy RGB bands of medium resolution aerial imagery is SegNet. The overall accuracy of the model increases when includes Near Infrared (NIR) and Short-Wave Infrared (SWIR) band. The results showed that our proposed method (our modified SegNet model, named RGB-IR-IDX-MSN method) outperforms all of the baselines in terms of mean F1 scores.