✅ Accepted at ACL 2026 Workshop TrustNLP Fast Track

GateKD

Confidence-Gated Closed-Loop Distillation
for Robust Reasoning

GateKD selectively distills reliable reasoning signals from large language models using confidence-aware closed-loop supervision, reducing hallucination transfer while improving logical and symbolic reasoning in compact student models.

+4.9
Logical Reasoning

Large gains on shuffled object tracking benchmarks.

+4.7
Symbolic Reasoning

Robust improvements under severe capacity constraints.

80M
Small Models

Reliable reasoning transfer for compact language models.

Overview

Why Existing Distillation Fails

Existing reasoning distillation methods blindly trust all teacher reasoning trajectories equally. However, even strong LLMs produce hallucinated intermediate reasoning steps, unstable representations, and noisy attention patterns.

❌ Open-Loop Distillation
  • • Treats all teacher reasoning equally reliable
  • • Transfers hallucinated supervision
  • • Amplifies noisy hidden states
  • • Distills unstable attention structures
  • • Fails under logical reasoning tasks
✅ GateKD Closed-Loop Supervision
  • • Confidence-aware supervision gating
  • • Suppresses unreliable reasoning
  • • Selective hidden-state alignment
  • • Reliability-filtered attention transfer
  • • Robust multi-step reasoning transfer
Method

Confidence-Gated Closed-Loop Distillation

GateKD dynamically regulates when and how teacher supervision should be transferred using predictive entropy as a unified reliability signal.

GateKD Architecture
🔒

Confidence-Gated Soft Supervision

Teacher soft labels are weighted by predictive confidence, suppressing unreliable reasoning supervision.

\[ \mathcal{L}_{gate-soft} = C(x) \cdot CE(p^T, p^S) \]
🧠

Gated Hidden-State Evolution

Intermediate representations are aligned only when teacher reasoning is stable and reliable.

\[ ||h^S - \phi(h^T)||_2^2 \]
🎯

Reliability-Filtered Attention

Structural reasoning patterns are distilled selectively through confidence-aware attention alignment.

\[ ||A^S - A^T||_2^2 \]
Results

State-of-the-Art Reasoning Transfer

GateKD consistently outperforms strong open-loop distillation baselines across commonsense, logical, and symbolic reasoning benchmarks.

Model Method CSQA SQA Logical Symbolic
T5-small Mentor-KD 58.6 51.8 72.9 55.2
T5-small GateKD 61.3 54.6 80.8 60.1
FlanT5-small Mentor-KD 60.4 53.7 76.4 58.1
FlanT5-small GateKD 63.2 56.1 83.7 62.5
Qualitative Analysis

Not All Reasoning Trajectories
Should Be Trusted Equally

GateKD suppresses speculative and unstable teacher reasoning, enabling students to internalize more grounded and expertise-aligned inference patterns.

StrategyQA Reasoning

GateKD suppresses speculative reasoning trajectories and prioritizes physically grounded inference.

Object Tracking

Confidence-aware gating stabilizes intermediate reasoning evolution.

Temporal Reasoning

Prevents propagation of early reasoning errors.

Arithmetic Reasoning

Reinforces structured symbolic manipulation skills.

Existing distillation methods blindly trust all teacher reasoning.

GateKD learns when the teacher should be trusted.
Citation

BibTeX

BibTeX Citation
@inproceedings{kao2026gatekd,
  title = {GateKD: Confidence-Gated Closed-Loop Distillation
           for Robust Reasoning},
  author = {Sermsri, K. and Panboonyuen, T.},
  booktitle = {ACL 2026 Workshop TrustNLP Fast Track},
  year = {2026}
}