Empirical Performance

2 sources - 8 claims

At 123M on WikiText-103, fixed-horn 3PT reduced perplexity and bits per byte versus the matched RoPE-only baseline. Long-horizon 5.5M runs showed the best phase-aligned and PhRMS variant beat RoPE-only by 13.30% perplexity. On the modern backbone, three-phase structure without RoPE underperformed RoPE alone, while three-phase plus RoPE improved over RoPE-only. In shallow FashionMNIST configurations, HGF outperforms both MLP and PCN across all tested widths. At matched quality, 3PT converged in fewer steps and less wall-clock time despite higher per-step cost. In deeper FashionMNIST configurations, HGF degrades more gracefully than vanilla PCN, while MLP has the highest accuracy. HGF is slower per sample than MLP but faster than PCN with 20 inference steps. HGF shows advantages in online learning, small-data, and concept-drift settings.