Phase-Aligned GQA

Cumulative refinement found phase-aligned heads helpful. Phase-aligned GQA requires the number of query and key-value heads to be divisible by the number of phases. Phase-aligned GQA is only a layout constraint; attention softmax and outpu…

1 sources - 4 claims

Cumulative refinement found phase-aligned heads helpful. Phase-aligned GQA requires the number of query and key-value heads to be divisible by the number of phases. Phase-aligned GQA is only a layout constraint; attention softmax and output projection still mix heads globally. The article argues that 3PT's gain comes partly from coordinating phase-compatible head layout with other phase-aware operations.