Three-Phase Transformer

3PT is not presented as a new positional encoding or embedding trick. 3PT partitions the residual stream into equal phases, with three phases as the canonical setting. Three-Phase Transformer applies a structured residual-stream geometry t…

1 sources - 4 claims

3PT is not presented as a new positional encoding or embedding trick. 3PT partitions the residual stream into equal phases, with three phases as the canonical setting. Three-Phase Transformer applies a structured residual-stream geometry to a decoder-only Transformer while retaining standard components such as RoPE, GQA, RMSNorm-style normalization, and SwiGLU. The article interprets 3PT as a residual-stream structural prior rather than a replacement for attention or positional encoding.