Satellite image inpainting is critical in remote sensing applications. We propose KAO, a diffusion-based framework enhanced with Kernel-Adaptive Optimization and Token Pyramid Transformer (TPT), enabling dynamic kernel modulation in latent space. KAO delivers high-fidelity, structure-aware reconstructions, outperforming existing models like Stable Diffusion, RePaint, and SatDiff across VHR datasets.
This section introduces the mathematical foundation of Kernel-Adaptive Optimization (KAO) and how it enhances diffusion-based inpainting for satellite imagery. Our method modifies the training objective of diffusion models to be spatially adaptive, allowing the model to focus more on complex or ambiguous regions — a crucial property for high-fidelity satellite image restoration.
Diffusion models are trained by gradually corrupting an image \( x_0 \) into noise through a forward stochastic process:
Here, \( \alpha_t \) controls the noise schedule at time step \( t \). Over many steps, the image becomes pure noise. The model learns to reverse this corruption — i.e., denoise:
The goal is to predict a clean sample \( x_{t-1} \) given a noisy one \( x_t \), using learned parameters \( \theta \).
Traditional training minimizes the Kullback-Leibler divergence between the true reverse process and the model's approximation:
This objective treats all pixels and regions equally. But in satellite images, some areas (e.g., occlusions or fine-grained structures) are harder to reconstruct and more important for downstream analysis.
To address this, we introduce Kernel-Adaptive Optimization, a spatially-aware training scheme. KAO modifies the loss to weight each training example based on the difference between noisy and denoised states:
The kernel \( K(x_t, x_{t-1}) \) acts as an adaptive weight, emphasizing regions with greater uncertainty or semantic change:
This formulation encourages the model to learn more from areas where its current predictions deviate most from the target. Intuitively, this means KAO focuses learning capacity on harder regions — such as occluded structures or fine textures — resulting in better inpainting quality, especially in geospatial contexts.
In continuous time, the forward diffusion can be described by a stochastic differential equation (SDE):
And the learned reverse-time process becomes:
This SDE-based view supports advanced sampling strategies and allows KAO to extend to continuous-time score-based generative models.
KAO also supports flexible conditioning in latent space. During inference, we blend representations from both the inferred and conditioning paths:
where \( D(m) \) is a learned decoder map derived from the binary mask \( m \). Finally, the denoised outputs are blended similarly:
This ensures spatially-aware reconstruction: occluded regions follow the conditional guidance, while unoccluded areas retain the model’s generative diversity.
Each scene below presents a qualitative comparison of inpainting performance across seven models. From left to right, the columns show:
Each row corresponds to a different scene type—ranging from urban to agricultural landscapes and cloud-covered areas. Compare across columns to evaluate each model’s ability to restore structural details and textures. KAO consistently produces high-fidelity outputs that preserve spatial layout, align with real-world features, and outperform others in restoring occluded regions.
@article{panboonyuen2025kao,
author = {Teerapong Panboonyuen},
title = {KAO: Kernel-Adaptive Optimization in Diffusion for Satellite Image},
journal = {IEEE Transactions on Geoscience and Remote Sensing},
year = {2025},
doi = {10.1109/TGRS.2025.3621738},
note = {Manuscript No. TGRS-2025-06970},
publisher = {IEEE}
}