πŸ”„ Under ACL Rolling Review (ARR) May 2026 Β· EMNLP 2026 Submission

CAREF

Calibration-Aware Regularization
for Explanation Faithfulness
Without Rationale Supervision

By Teerapong Panboonyuen

A parameter-efficient fine-tuning framework that jointly optimises predictive accuracy and explanation faithfulness via a single unified lossβ€”the Sparsity-Calibrated Entropic Divergence (β„’SCED)β€”using only 6.43% of trainable parameters.

89.04
Best Avg. Accuracy

CAREF-AQ across COS-E, ECQA, ComVE, e-SNLI benchmarks.

81.00
nBERT Alignment

Best explanation faithfulness score without rationale labels.

6.43%
Trainable Parameters

Beats LoRA & AdaLoRA at a fraction of the parameter budget.

The Problem with Faithful Explanations

Language models excel at NLU yet fail to produce faithful explanationsβ€”ones that causally reflect the model's decision process. Standard fine-tuning is agnostic to how a model allocates probability mass, leading to a structural gap between accuracy and explanation quality.

CAREF overview β€” accuracy vs parameter budget, explanation quality, human evaluation, hyperparameter sensitivity

Figure 1. CAREF at a glance: (a) Accuracy vs. trainable parameter budget across variants; (b) nBERT explanation quality per dataset (w/ vs. w/o CAREF); (c) Human evaluation faithfulness scores; (d) Sensitivity of nBERT to Ξ± and Ξ² on e-SNLI.

❌ Existing Approaches Fail Because…
  • β€’ Rationale-supervised methods require expensive token-level annotations
  • β€’ Post-hoc attribution (attention rollout, IG) operates after trainingβ€”cannot influence representations
  • β€’ Entropy penalties flatten distributions indiscriminately, harming interpretability
  • β€’ Label smoothing redistributes mass uniformly, actively hurting explanation coherence
  • β€’ Sparsemax/Entmax introduces hard sparsity with discontinuous gradients
βœ… CAREF's Key Insight
  • β€’ Single unified loss β„’SCED jointly regulates entropy and sparsity
  • β€’ No rationale supervisionβ€”only task labels required
  • β€’ Fully differentiable, plug-in compatible with any PEFT method
  • β€’ Token-adaptive: penalty concentrates on decision-relevant vocabulary
  • β€’ Architecture-agnostic: works with T5, BART, LLaMA, GPT families

The CAREF Framework

CAREF couples entropy-based calibration with token-level sparsity control through one loss termβ€” without any architectural changes.

\(\mathcal{L}_{\text{CAREF}} = \mathcal{L}_{\text{CE}} + \lambda_{\text{SCED}} \cdot \mathcal{L}_{\text{SCED}} + \lambda_{\text{KL}} \cdot \mathcal{L}_{\text{KL}}\)

Cross-entropy preserves task accuracy; KL term provides global calibration pressure; β„’SCED provides local, adaptive token-level regularization.

\(\mathcal{L}_{\text{SCED}} = \sum_{t}\sum_{v=1}^{|\mathcal{V}|} \left|P_{t,v}\log\frac{P_{t,v}}{U_v}\right|^{\alpha} \cdot (1-P_{t,v})^{\beta}\)
Ξ± β€” Entropic Curvature
For Ξ± = 1 β†’ standard KL contribution per token pair. For Ξ± > 1 β†’ super-linear penalty; large deviations from uniform incur disproportionately higher gradients. Combats overconfidence on spurious tokens that harm faithfulness.
Ξ² β€” Adaptive Token Sparsity
Factor (1βˆ’Pt,v)Ξ² attenuates penalty for high-probability tokens, concentrating regularization on the tail of the distribution. Discourages spurious vocabulary-wide mass diffusion without disturbing confident, decision-relevant predictions.
Parameters Recovered Behavior
Ξ±=1, Ξ²=0Standard per-token KL divergence from uniform; no sparsity weighting
Ξ±>1, Ξ²=0Power-law entropic penalty; large deviations incur super-linearly growing cost
Ξ±=1, Ξ²>0Sparsity-weighted KL; focuses pressure on low-probability tail vocabulary
Ξ±>1, Ξ²>0Full CAREF regime: adaptive sparsity + non-linear entropic curvature βœ…

State-of-the-Art with 6.43% Parameters

Evaluated on four NLE benchmarks (COS-E, ECQA, ComVE, e-SNLI) with Flan-T5 across 60 FEB splits. CAREF-AQ achieves the best average accuracy 89.04 and nBERT 81.00, outperforming LoRA and AdaLoRA.

CAREF full results β€” accuracy vs parameter budget and nBERT explanation quality

Figure 2. Accuracy vs. parameter budget (left) and nBERT explanation quality per dataset (right). CAREF-AQ (star β˜…) dominates both axes simultaneously.

Section Model M COS-E ECQA ComVE e-SNLI Avg Param%
Ablation
(RQ1)
CAREF-BASEA85.1986.2994.7688.3288.64100%
E74.0077.5186.6079.7679.47
CAREF-DECA81.5685.7694.7374.7384.1952.23%
E71.5377.4087.5467.8976.09
CAREF-AQKVA84.3088.5194.3788.2688.8619.28%
E73.4379.2986.7381.8080.31
CAREF-LAQA84.7388.9394.2388.0788.996.44%
E74.5779.9686.5780.9680.52
CAREF-AQ β˜…A84.5988.9394.1688.4889.046.43%
E74.4280.3187.2981.9781.00
vs. PEFT
(RQ3)
AdaLoRA 4.46%A43.3215.1635.320.5623.594.46%
E35.4812.3330.740.4719.75
LoRA R=128A84.5891.7794.5783.3788.5715.74%
E74.5282.8987.7377.1780.58
CAREF-AQ β˜…A84.5988.9394.1688.4889.046.43%
E74.4280.3187.2981.9781.00

M: metric β€” A = Accuracy, E = nBERT. Highlighted rows are CAREF variants. Results averaged over 60 splits.

Faithful Explanations, No Rationale Labels

CAREF-AQ consistently produces concise, causally grounded explanations across datasets. Baseline models generate verbose or semantically incoherent justifications.

CAREF qualitative comparison β€” COS-E predictions

Architecture & Model Overview

CAREF fine-tunes only attention query projections (6.43% of parameters), encouraging sparse and calibrated attention patterns that concentrate on causally relevant input spans.

CAREF qualitative predictions β€” COS-E and ECQA

Sample Predictions Across Benchmarks

CAREF-AQ (bottom rows) produces concise, factually precise answersβ€”e.g., correctly identifying "Atlantic Ocean" and "air conditioning" with tighter, more faithful explanations than BASE or DEC variants.

0.69
ECQA
Fact-grounded commonsense β€” highest faithfulness score
0.58
e-SNLI
Well-defined entailment structure aids legibility
0.53
ComVE
World knowledge plausibility discrimination
44
Strong Yes
CAREF yields 44 "Strong Yes" labels on ECQA β€” no rationale supervision
Existing methods require rationale annotations or full fine-tuning.

CAREF gets both accuracy and faithfulness for free.

BibTeX

@inproceedings{panboonyuen2026caref,
  title     = {CAREF: Calibration-Aware Regularization for
              Explanation Faithfulness Without Rationale
              Supervision},
  author    = {Panboonyuen, Teerapong},
  journal   = {arXiv preprint arXiv:2605.27835},
  note      = {Under ACL Rolling Review (ARR) May 2026},
  year      = {2026},
  url       = {https://kaopanboonyuen.github.io/CAREF}
}