KST 2026 ยท Accepted Paper

Seeing Isn't Always Believing:
Evaluating Grad-CAM Faithfulness
in Lung Cancer CT Classification

A rigorous, quantitative evaluation of Grad-CAM faithfulness and localization reliability across modern deep learning architectures.

Teerapong Panboonyuen Chulalongkorn University MARSAIL Lab KST 2026
Conference ยท KST 2026 Paper ยท Accepted Medical Imaging XAI ยท Grad-CAM CT Classification
Read Paper View Code Cite
GradFaith-CAM main figure showing Grad-CAM heatmaps across architectures

Fig. 1 โ€” Grad-CAM activation maps across CNN and Vision Transformer architectures on lung CT scans.


01 ยท Motivation

Why faithfulness matters
in medical AI

Grad-CAM has become the de facto explainability tool for medical image analysis. But a critical question remains unanswered.

Do Grad-CAM heatmaps truly reflect the model's reasoning โ€” or are we just seeing convincing illusions?

This paper provides the first rigorous, quantitative evaluation of Grad-CAM faithfulness and localization reliability across modern deep learning architectures for lung cancer CT classification. We demonstrate that high accuracy does not imply trustworthy explanation โ€” and that blind trust in saliency maps can be clinically dangerous.

โš ๏ธ Medical AI does not fail loudly โ€” it fails convincingly. This work shows why explainability must be quantitative, model-aware, and clinically grounded.

Five core advances

๐ŸŽฏ

Faithfulness-Aware Evaluation

First framework that quantitatively measures whether Grad-CAM highlights truly drive model decisions in CT lung cancer classification.

๐Ÿ”€

Cross-Architecture Analysis

Systematic comparison across CNNs (ResNet, DenseNet, EfficientNet) and Vision Transformers โ€” revealing fundamentally different failure modes.

๐Ÿ“

Quantitative Explanation Metrics

Novel evaluation metrics that go beyond visual inspection โ€” enabling objective comparison of explanation quality.

๐Ÿšจ

Shortcut Learning Exposure

Evidence of shortcut learning in DenseNet โ€” models that appear to explain correctly while relying on spurious correlations.

๐Ÿฅ

Clinical Implications

Practical guidelines for deploying trustworthy medical AI systems where explainability must meet clinical standards.


IQ-OTH/NCCD Lung Cancer CT Dataset

Publicly available, ethically approved, expert-annotated by radiologists and oncologists.

0
CT Slices
0
Patients
3
Classes
Class Description Annotation
Normal No abnormal findings in CT scan Radiologist verified
Benign Non-cancerous pulmonary nodule present Oncologist annotated
Malignant Cancerous tissue identified Multi-expert consensus
๐Ÿ”’ All data are de-identified and ethically approved. No patient-identifiable information is included in this repository.

Five architectures, two paradigms

From classical convolutional networks to attention-based Vision Transformers.

Architecture Type Parameters Mechanism
ResNet-50 CNN 25.6M Residual connections
ResNet-101 CNN 44.5M Deep residual blocks
DenseNet-161 CNN 28.7M Dense skip connections
EfficientNet-B0 CNN 5.3M Compound scaling
ViT-Base-Patch16-224 Transformer 86M Self-attention over patches

GradFaith-CAM: beyond pretty heatmaps

Three complementary faithfulness metrics that together answer: does the highlighted region actually matter for the prediction?

01

Localization Accuracy

Measures spatial overlap between Grad-CAM activation maps and ground-truth tumor regions annotated by radiologists.

02

Perturbation-Based Faithfulness

Quantifies drop in model confidence when highlighted regions are occluded โ€” a faithful map should cause a significant confidence drop.

03

Explanation Consistency

Evaluates stability of activation patterns across random seeds and model re-initializations to measure explanation robustness.

Interpretability without faithfulness is just another illusion.


Grad-CAM is NOT uniformly reliable

Our quantitative evaluation reveals systematic failures in saliency-based explanation across all tested architectures.


Reproduce our experiments

All code, configs, and pretrained checkpoints are available in the repository.

Installation
git clone https://github.com/yourusername/GradFaith-CAM.git
cd GradFaith-CAM
pip install -r requirements.txt
Train a model
python experiments/train.py --config configs/resnet.yaml
Evaluate Grad-CAM faithfulness
python experiments/evaluate.py --model resnet50
Visualize explanations
python experiments/visualize.py --image sample.png

Cite this work

If you use this code or findings in your research, please cite:

BibTeX
@inproceedings{panboonyuen2026gradfaithcam,
  title     = {Seeing Isn't Always Believing: Analysis of Grad-CAM
               Faithfulness and Localization Reliability in Lung
               Cancer CT Classification},
  author    = {Panboonyuen, Teerapong},
  booktitle = {Proceedings of the 18th International Conference on
               Knowledge and Smart Technology (KST)},
  year      = {2026}
}

About the researcher

TP
Teerapong Panboonyuen
Chulalongkorn University ยท MARSAIL Laboratory
Medical Imaging Explainable AI Deep Learning Computer Vision

This research was conducted at Chulalongkorn University and MARSAIL (Motor AI Recognition Solution Artificial Intelligence Laboratory).