MeViT: A Medium-Resolution Vision Transformer for Semantic Segmentation 🛰️

Teerapong Panboonyuen

AGL (Advancing Geoscience Laboratory), Chulalongkorn University

MIT License

MeViT is a Vision Transformer (ViT) model tailored for semantic segmentation on medium-resolution Landsat satellite imagery of Thai agricultural regions. It classifies crops like para rubber, corn, and pineapple using a revised MixCFN block that balances depth-wise convolution paths for multi-scale feature extraction.

At AGL (Advancing Geoscience Laboratory), Chulalongkorn University, we focus on developing state-of-the-art AI models for satellite imagery and remote sensing applications. Our research spans Vision Transformers, stable diffusion, and weakly supervised learning for semantic segmentation, inpainting, and temporal forecasting.

  • Precision: 92.22%
  • Recall: 94.69%
  • F1 Score: 93.44%
  • Mean IoU: 83.63%

📄 Read Full Publication

MeViT Model Overview

🔧 Installation

git clone https://github.com/kaopanboonyuen/MeViT.git
cd MeViT
pip install -r requirements.txt
        

🚀 Usage

1. Configuration

Modify the config.yaml file to point to your dataset paths.

2. Training

python train.py

3. Evaluation

python evaluate.py

4. Inference

python inference.py

🛰️ Dataset

Landsat satellite data used in this project is not included in the repo. Refer to the documentation for download and preprocessing instructions.

Sample Landsat Imagery

🌐 Project Website

Explore full documentation and updates at: 🔗 MeViT Website

📖 Citation

If you use this work, please cite:

@article{panboonyuen2023mevit,
  title={MeViT: A Medium-Resolution Vision Transformer for Semantic Segmentation on Landsat Satellite Imagery for Agriculture in Thailand},
  author={Panboonyuen, Teerapong and Charoenphon, Chaiyut and Satirapod, Chalermchon},
  journal={Remote Sensing},
  volume={15},
  number={21},
  pages={5124},
  year={2023},
  publisher={MDPI}
}
        

🖼️ More Visualizations

Visualization 1 Visualization 2 Visualization 3 Visualization 4