Stability Diagnostics
Median active cosine similarity between the initial and corrective gradients stayed approximately stable across K values. Surrogate diagnostics showed that extrapolated-point absolute error increased with K but remained small. GXPO maintai…
1 sources - 6 claims
Median active cosine similarity between the initial and corrective gradients stayed approximately stable across K values. Surrogate diagnostics showed that extrapolated-point absolute error increased with K but remained small. GXPO maintains a rolling buffer of corrective-gradient norms and computes a z-score before adding the current norm. If the corrective-gradient norm is unusually large, GXPO treats the lookahead signal as unreliable and falls back to GRPO. KL and clipping diagnostics indicated that GXPO repositioning did not substantially increase clipping. Mean KL penalties were higher for GXPO, especially at larger K, but remained small in absolute terms.