MeViT: Medium-Resolution Vision Transformer

MeViT is a Vision Transformer (ViT) model tailored for semantic segmentation on medium-resolution Landsat satellite imagery of Thai agricultural regions. It classifies crops like para rubber, corn, and pineapple using a revised MixCFN block that balances depth-wise convolution paths for multi-scale feature extraction.

At AGL (Advancing Geoscience Laboratory), Chulalongkorn University, we focus on developing state-of-the-art AI models for satellite imagery and remote sensing applications. Our research spans Vision Transformers, stable diffusion, and weakly supervised learning for semantic segmentation, inpainting, and temporal forecasting.

Precision: 92.22%
Recall: 94.69%
F1 Score: 93.44%
Mean IoU: 83.63%

📄 Read Full Publication

🔧 Installation

git clone https://github.com/kaopanboonyuen/MeViT.git
cd MeViT
pip install -r requirements.txt

🚀 Usage

1. Configuration

Modify the config.yaml file to point to your dataset paths.

2. Training

python train.py

3. Evaluation

python evaluate.py

4. Inference

python inference.py

🛰️ Dataset

Landsat satellite data used in this project is not included in the repo. Refer to the documentation for download and preprocessing instructions.

📖 Citation

If you use this work, please cite:

@article{panboonyuen2023mevit,
  title={MeViT: A Medium-Resolution Vision Transformer for Semantic Segmentation on Landsat Satellite Imagery for Agriculture in Thailand},
  author={Panboonyuen, Teerapong and Charoenphon, Chaiyut and Satirapod, Chalermchon},
  journal={Remote Sensing},
  volume={15},
  number={21},
  pages={5124},
  year={2023},
  publisher={MDPI}
}

MeViT: A Medium-Resolution Vision Transformer for Semantic Segmentation 🛰️