Structural Gradient Asymmetry
The measured structural constant stays below 0.1 at the median and below 0.5 at the 95th percentile across measured models. The structural constant remains stable during RL training, including at naive-reuse collapse. The paper argues stru…
1 sources - 4 claims
The measured structural constant stays below 0.1 at the median and below 0.5 at the 95th percentile across measured models. The structural constant remains stable during RL training, including at naive-reuse collapse. The paper argues structural gradient asymmetry may be a general architectural property of pre-norm Transformers rather than a model-specific artifact. Theorem 1 bounds intermediate-layer gradient magnitude relative to lm_head gradient magnitude using an architectural constant.