PhaseRotationLayer
Residualizing the rotation worsened performance, supporting the non-residual design choice. PhaseRotationLayer is inserted between attention and the FFN without a residual connection. The rotation layer uses learnable angle vectors shared…
1 sources - 4 claims
Residualizing the rotation worsened performance, supporting the non-residual design choice. PhaseRotationLayer is inserted between attention and the FFN without a residual connection. The rotation layer uses learnable angle vectors shared across phases, with phase-specific offsets. The non-residual orthogonal rotation preserves norm, invertibility, and gradient singular values.