Future Extensions
Large language model scaling beyond the translation setup is left as an open empirical question. Diffusion-model settings and implicit architectures such as DEQs and Neural ODEs are proposed as future applications. The paper identifies alt…
1 sources - 4 claims
Large language model scaling beyond the translation setup is left as an open empirical question. Diffusion-model settings and implicit architectures such as DEQs and Neural ODEs are proposed as future applications. The paper identifies alternative divergences such as Renyi divergences as a natural extension. LLQR may be useful as a testbed for studying how optimization geometry changes learned solutions.