Geometry-Aware Optimization
Geometry-aware methods can affect both convergence speed and the implicit bias of training trajectories. Standard gradient descent is steepest descent under the Euclidean norm, while Newton, Gauss-Newton, and natural-gradient methods use c…
1 sources - 5 claims
Geometry-aware methods can affect both convergence speed and the implicit bias of training trajectories. Standard gradient descent is steepest descent under the Euclidean norm, while Newton, Gauss-Newton, and natural-gradient methods use curvature or divergence-induced metrics. Scalable preconditioners often make computation tractable by imposing block-diagonal or factored structure early. Dense principled curvature matrices are difficult to use directly because they couple parameters across layers through the chain rule. LLQR can compare optimizers interpretable as steepest descent under different norms within a common layerwise optimal-control objective.