Disproportionate Weight Divergence
The identification of DWD is framed as a principled way to understand sample-reuse instability in RLVR. DWD appears as an abrupt isolated surge in lm_head relative weight change under naive reuse. DWD is explained as being structurally loc…
1 sources - 4 claims
The identification of DWD is framed as a principled way to understand sample-reuse instability in RLVR. DWD appears as an abrupt isolated surge in lm_head relative weight change under naive reuse. DWD is explained as being structurally localized to the lm_head. Single-use training does not produce the lm_head surge, indicating sample reuse is the direct cause.