GateKD: Confidence-Gated Closed-Loop Distillation for Robust Reasoning

Wed, 13 May 2026 00:00:00 +0000

GateKD (Confidence-Gated Closed-Loop Distillation) addresses a critical challenge in reasoning distillation: transferring reliable multi-step reasoning abilities from large language models into smaller student models without propagating hallucinated or noisy supervision. While conventional knowledge distillation methods treat the teacher model as a uniformly reliable oracle, real-world reasoning traces often contain uncertain intermediate steps, unstable rationales, and misleading attention patterns. GateKD reframes reasoning distillation as a confidence-aware closed-loop learning problem in which the teacher dynamically regulates how much information should be transferred during training.

The framework introduces three complementary confidence-gated mechanisms that collectively stabilize reasoning transfer. First, confidence-gated soft supervision selectively distills predictive signals only when the teacher exhibits high certainty. Second, gated hidden-state evolution aligns intermediate latent representations between teacher and student models while suppressing unreliable hidden dynamics. Third, reliability-filtered attention distillation transfers stable reasoning structures while filtering noisy or hallucinated attention maps that could otherwise corrupt student reasoning trajectories.

Together, these components establish a closed feedback loop in which teacher confidence continuously modulates supervision strength across the distillation pipeline. Unlike traditional open-loop distillation approaches, GateKD enables the student model to learn reasoning behaviors from selectively trusted intermediate states rather than blindly imitating all teacher outputs. Extensive experiments across commonsense, logical, and symbolic reasoning benchmarks demonstrate consistent improvements over strong open-loop baselines using T5 and Flan-T5 architectures of varying sizes.

Beyond empirical performance gains, GateKD highlights a broader paradigm shift toward confidence-aware reasoning supervision for trustworthy language models. The framework shows strong robustness under low-resource distillation settings, reduces hallucination transfer, and maintains stable reasoning fidelity even under noisy supervision conditions. These findings suggest that confidence-gated teacher-student interaction is a promising direction for building scalable, efficient, and reliable reasoning systems suitable for real-world deployment in resource-constrained environments.

trustnlp | Teerapong Panboonyuen

GateKD: Confidence-Gated Closed-Loop Distillation for Robust Reasoning