Empirical Benchmarks

2 sources - 11 claims

In the multi-well study, SHAPE achieved a success rate of 0.602 and best gap of 0.477 over 500 tasks. On IWSLT14 German-to-English translation, LLQR with E-KFAC and NGD improved BLEU from 34.24 to 34.51 with 1.16x time. On ImageNet ResNet-50, E-KFAC with NGD reached 78.05 top-1 accuracy with about 1.032x time. Across the fixed-budget summary, SHAPE improved best-so-far performance and hit rate on several benchmark families. Across 10 random seeds, LLQR reduced steps to grok and matched or improved wall-clock time to grok. Baselines were evaluated under matched instances, starts, oracle streams, projection or clipping, and total oracle budgets. On phase retrieval, SHAPE's averaged full or mini-batch first-order results had lower final and best gaps than NAG in the reported table. The results support reporting terminal and best-so-far metrics separately. Grokking experiments used five modular arithmetic datasets modulo the prime 97. The main benchmarks included synthetic functions, Lennard-Jones objectives, phase retrieval, and control trajectory optimization. Architecture transfer was evaluated on PyramidNet-110, VGG-16-BN, and WRN-28-10 using E-KFAC LLQR.