KAO: Kernel-Adaptive Optimization in Diffusion

Abstract

Satellite image inpainting is critical in remote sensing applications. We propose KAO, a diffusion-based framework enhanced with Kernel-Adaptive Optimization and Token Pyramid Transformer (TPT), enabling dynamic kernel modulation in latent space. KAO delivers high-fidelity, structure-aware reconstructions, outperforming existing models like Stable Diffusion, RePaint, and SatDiff across VHR datasets.

Results Overview

Qualitative comparison with 7 models. KAO shows superior restoration across various occlusion patterns.

Detailed sample comparisons. KAO excels in reconstructing linear features and textures in urban scenes.

Approach

This section introduces the mathematical foundation of Kernel-Adaptive Optimization (KAO) and how it enhances diffusion-based inpainting for satellite imagery. Our method modifies the training objective of diffusion models to be spatially adaptive, allowing the model to focus more on complex or ambiguous regions — a crucial property for high-fidelity satellite image restoration.

Diffusion Process

Diffusion models are trained by gradually corrupting an image \( x_0 \) into noise through a forward stochastic process:

q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{\alpha_t} x_{t-1}, (1 - \alpha_t) \mathbf{I})

Here, \( \alpha_t \) controls the noise schedule at time step \( t \). Over many steps, the image becomes pure noise. The model learns to reverse this corruption — i.e., denoise:

p(x_{t-1}|x_t) = \mathcal{N}(x_{t-1}; \mu_\theta(x_t), \Sigma_\theta(x_t))

The goal is to predict a clean sample \( x_{t-1} \) given a noisy one \( x_t \), using learned parameters \( \theta \).

Optimization Objective

Traditional training minimizes the Kullback-Leibler divergence between the true reverse process and the model's approximation:

\arg \min_{\theta} \mathbb{E}_{t} \left[ D_{KL}(q(x_{t-1}|x_t, x_0) \parallel p_{\theta}(x_{t-1}|x_t)) \right]

This objective treats all pixels and regions equally. But in satellite images, some areas (e.g., occlusions or fine-grained structures) are harder to reconstruct and more important for downstream analysis.

Kernel-Adaptive Optimization (KAO)

To address this, we introduce Kernel-Adaptive Optimization, a spatially-aware training scheme. KAO modifies the loss to weight each training example based on the difference between noisy and denoised states:

\theta^* = \arg\min_\theta \mathbb{E}_{t} \left[ D_{KL}(q(x_{t-1}|x_t, x_0) \parallel p_\theta(x_{t-1} | x_t)) \cdot K(x_t, x_{t-1}) \right]

The kernel \( K(x_t, x_{t-1}) \) acts as an adaptive weight, emphasizing regions with greater uncertainty or semantic change:

K(x_t, x_{t-1}) = \exp\left(-\frac{\|x_t - x_{t-1}\|^2}{2\sigma^2}\right)

This formulation encourages the model to learn more from areas where its current predictions deviate most from the target. Intuitively, this means KAO focuses learning capacity on harder regions — such as occluded structures or fine textures — resulting in better inpainting quality, especially in geospatial contexts.

SDE Formulation

In continuous time, the forward diffusion can be described by a stochastic differential equation (SDE):

dX_t = f(X_t, t) \, dt + g(t) \, dW_t

And the learned reverse-time process becomes:

dX_t = \left[f(X_t, t) - g(t)^2 s_\theta(X_t, t)\right] \, dt + g(t) \, d\bar{W}_t

This SDE-based view supports advanced sampling strategies and allows KAO to extend to continuous-time score-based generative models.

Token-Level Conditioning

KAO also supports flexible conditioning in latent space. During inference, we blend representations from both the inferred and conditioning paths:

h^* = h^{infr} \odot (1 - D(m)) + h^{cond} \odot D(m)

where \( D(m) \) is a learned decoder map derived from the binary mask \( m \). Finally, the denoised outputs are blended similarly:

x_{t-1} = x_{t-1}^{infr} \odot (1 - m) + x_{t-1}^{cond} \odot m

This ensures spatially-aware reconstruction: occluded regions follow the conditional guidance, while unoccluded areas retain the model’s generative diversity.

How to Read the Following Results

Each scene below presents a qualitative comparison of inpainting performance across seven models. From left to right, the columns show:

(1) The occluded input
(2) The ground truth image
(3) Stable Diffusion [25]
(4) RePaint [16]
(5) SatDiff [1]
(6) DPS [26]
(7) PSLD [27]
(8) Our method – KAO

Each row corresponds to a different scene type—ranging from urban to agricultural landscapes and cloud-covered areas. Compare across columns to evaluate each model’s ability to restore structural details and textures. KAO consistently produces high-fidelity outputs that preserve spatial layout, align with real-world features, and outperform others in restoring occluded regions.

Scene 1 – Urban satellite reconstruction comparison.

Scene 2 – Agricultural patterns, occlusion restoration.

Scene 3 – Reconstruction under heavy cloud occlusions.

Scene 4 – Comparison on semi-urban environment.

Scene 5 – Multi-resolution image restoration results.

Scene 6 – Zoomed-in structural fidelity of KAO.

BibTeX Citation

@article{panboonyuen2025kao,
      author    = {Teerapong Panboonyuen},
      title     = {KAO: Kernel-Adaptive Optimization in Diffusion for Satellite Image},
      journal   = {IEEE Transactions on Geoscience and Remote Sensing},
      year      = {2025},
      doi       = {10.1109/TGRS.2025.3621738},
      note      = {Manuscript No. TGRS-2025-06970},
      publisher = {IEEE}
    }