Computational Overhead
The estimated ImageNet overhead was about 1.02x and the measured multiplier was about 1.03x. Diagonal blocks are light, while Kronecker and E-KFAC structures increase memory and update cost. Update frequency is the dominant source of compu…
1 sources - 5 claims
The estimated ImageNet overhead was about 1.02x and the measured multiplier was about 1.03x. Diagonal blocks are light, while Kronecker and E-KFAC structures increase memory and update cost. Update frequency is the dominant source of computational cost, with recommended settings of 1-4 updates per epoch and 25-50 inner steps. Chunking lowers peak memory by splitting the preconditioner-update minibatch while accumulating the same relaxed objective contributions. LLQR adds memory and compute beyond standard first-order optimizer states.