Disproportionate Weight Divergence

The identification of DWD is framed as a principled way to understand sample-reuse instability in RLVR. DWD appears as an abrupt isolated surge in lm_head relative weight change under naive reuse. DWD is explained as being structurally loc…

1 sources - 4 claims

The identification of DWD is framed as a principled way to understand sample-reuse instability in RLVR. DWD appears as an abrupt isolated surge in lm_head relative weight change under naive reuse. DWD is explained as being structurally localized to the lm_head. Single-use training does not produce the lm_head surge, indicating sample reuse is the direct cause.