Head-Property Capacity

A single head was sufficient when only one evolving property was tracked. Blockwise evaluation does not remove the need for enough heads to separate concurrent property-specific routing channels. The article interprets insufficient heads a…

1 sources - 4 claims

A single head was sufficient when only one evolving property was tracked. Blockwise evaluation does not remove the need for enough heads to separate concurrent property-specific routing channels. The article interprets insufficient heads as interference among routing signals for multiple evolving properties. Accuracy collapsed when simultaneously evolving properties exceeded the number of attention heads.