Three-Phase Transformer
3PT is not presented as a new positional encoding or embedding trick. 3PT partitions the residual stream into equal phases, with three phases as the canonical setting. Three-Phase Transformer applies a structured residual-stream geometry t…
1 sources - 4 claims
3PT is not presented as a new positional encoding or embedding trick. 3PT partitions the residual stream into equal phases, with three phases as the canonical setting. Three-Phase Transformer applies a structured residual-stream geometry to a decoder-only Transformer while retaining standard components such as RoPE, GQA, RMSNorm-style normalization, and SwiGLU. The article interprets 3PT as a residual-stream structural prior rather than a replacement for attention or positional encoding.