lm_head Gradient Norm

Among evaluated signals, only the lm_head gradient norm produced a sharp spike synchronized with collapse onset. A surge in lm_head gradient norm is interpreted as a certified early-warning indicator rather than a heuristic proxy. The lm_h…

1 sources - 4 claims

Among evaluated signals, only the lm_head gradient norm produced a sharp spike synchronized with collapse onset. A surge in lm_head gradient norm is interpreted as a certified early-warning indicator rather than a heuristic proxy. The lm_head gradient norm lower-bounds empirical Pearson chi-squared divergence at the batch level. The lm_head gradient receives the raw error signal directly, while intermediate layer gradients are filtered through a Jacobian.