PhaseRotationLayer

Residualizing the rotation worsened performance, supporting the non-residual design choice. PhaseRotationLayer is inserted between attention and the FFN without a residual connection. The rotation layer uses learnable angle vectors shared…

1 sources - 4 claims

Residualizing the rotation worsened performance, supporting the non-residual design choice. PhaseRotationLayer is inserted between attention and the FFN without a residual connection. The rotation layer uses learnable angle vectors shared across phases, with phase-specific offsets. The non-residual orthogonal rotation preserves norm, invertibility, and gradient singular values.