Low-Rank Optimizers

GaLore is an optimizer-memory method for full-parameter learning rather than a LoRA-style parameter-efficient fine-tuning method. GaLore performs Adam-style updates in a reduced space and then projects updates back to full parameters. Proj…

1 sources - 5 claims

GaLore is an optimizer-memory method for full-parameter learning rather than a LoRA-style parameter-efficient fine-tuning method. GaLore performs Adam-style updates in a reduced space and then projects updates back to full parameters. Projection-based optimizers maintain optimizer states in a projected space and map updates back to the full parameter space. Low-rank methods use the observation that gradients or updates for large Transformer matrices can have low effective rank. Low-rank projection methods trade memory savings against approximation error and projection-refresh overhead.