Head-Property Capacity
A single head was sufficient when only one evolving property was tracked. Blockwise evaluation does not remove the need for enough heads to separate concurrent property-specific routing channels. The article interprets insufficient heads a…
1 sources - 4 claims
A single head was sufficient when only one evolving property was tracked. Blockwise evaluation does not remove the need for enough heads to separate concurrent property-specific routing channels. The article interprets insufficient heads as interference among routing signals for multiple evolving properties. Accuracy collapsed when simultaneously evolving properties exceeded the number of attention heads.