Gated Delta Network
Frame-level state access matches full-softmax teacher performance, whereas token-level access degrades average quality. Without restricting state updates to clean frames only, noisy denoising intermediates written into the recurrent state…
1 sources - 4 claims
Frame-level state access matches full-softmax teacher performance, whereas token-level access degrades average quality. Without restricting state updates to clean frames only, noisy denoising intermediates written into the recurrent state corrupt accumulated memory in a way that compounds with video length. Without frame-level state access, tokens within the same frame observe different versions of accumulated history depending on processing order, causing an asymmetry that has no counterpart in language modeling. The Gated Delta Network is selected for the inter-frame branch because its forget gate enables selective decay of stale context and its delta rule enables error-correction updates.