Distilling multi-step reasoning abilities from large language models (LLMs) into compact student models remains challenging due to noisy rationales, hallucinated supervision, and static teacher-student interactions. Existing reasoning distillation methods predominantly operate in an open-loop manner, implicitly assuming uniform teacher reliability and consequently propagating erroneous intermediate reasoning. We propose GateKD, a confidence-gated closed-loop distillation framework that enables robust reasoning transfer by treating the teacher as a dynamic gatekeeper rather than a static oracle. GateKD introduces three complementary mechanisms (i) confidence-gated soft supervision that selectively distills reliable predictive signals, (ii) gated hidden-state evolution that aligns intermediate representations only when teacher confidence is high, and (iii) reliability-filtered attention distillation that preserves stable reasoning structures while suppressing noisy patterns. Together, these mechanisms form a closed feedback loop in which teacher confidence continuously modulates the distillation process, reducing hallucination transfer and stabilizing student reasoning. Extensive experiments across commonsense, logical, and symbolic reasoning benchmarks, using T5 and Flan-T5 backbones of varying sizes, demonstrate that GateKD consistently outperforms strong open-loop distillation baselines. The framework shows substantial gains in logical and symbolic reasoning, remains robust under low-resource distillation settings, and exhibits clear degradation when gating mechanisms are removed. These findings highlight the importance of confidence-aware closed-loop supervision for building reliable and scalable small reasoning models.
GateKD (Confidence-Gated Closed-Loop Distillation) addresses a critical challenge in reasoning distillation: transferring reliable multi-step reasoning abilities from large language models into smaller student models without propagating hallucinated or noisy supervision. While conventional knowledge distillation methods treat the teacher model as a uniformly reliable oracle, real-world reasoning traces often contain uncertain intermediate steps, unstable rationales, and misleading attention patterns. GateKD reframes reasoning distillation as a confidence-aware closed-loop learning problem in which the teacher dynamically regulates how much information should be transferred during training.
The framework introduces three complementary confidence-gated mechanisms that collectively stabilize reasoning transfer. First, confidence-gated soft supervision selectively distills predictive signals only when the teacher exhibits high certainty. Second, gated hidden-state evolution aligns intermediate latent representations between teacher and student models while suppressing unreliable hidden dynamics. Third, reliability-filtered attention distillation transfers stable reasoning structures while filtering noisy or hallucinated attention maps that could otherwise corrupt student reasoning trajectories.
Together, these components establish a closed feedback loop in which teacher confidence continuously modulates supervision strength across the distillation pipeline. Unlike traditional open-loop distillation approaches, GateKD enables the student model to learn reasoning behaviors from selectively trusted intermediate states rather than blindly imitating all teacher outputs. Extensive experiments across commonsense, logical, and symbolic reasoning benchmarks demonstrate consistent improvements over strong open-loop baselines using T5 and Flan-T5 architectures of varying sizes.
Beyond empirical performance gains, GateKD highlights a broader paradigm shift toward confidence-aware reasoning supervision for trustworthy language models. The framework shows strong robustness under low-resource distillation settings, reduces hallucination transfer, and maintains stable reasoning fidelity even under noisy supervision conditions. These findings suggest that confidence-gated teacher-student interaction is a promising direction for building scalable, efficient, and reliable reasoning systems suitable for real-world deployment in resource-constrained environments.
