N-Phase Scaling
At 5.5M scale, N=1 achieved the best perplexity among tested phase counts. The stronger claim that three phases are optimal is not supported by the seed sweep. At 123M across three seeds, N=1 and N=3 were statistically indistinguishable. T…
1 sources - 4 claims
At 5.5M scale, N=1 achieved the best perplexity among tested phase counts. The stronger claim that three phases are optimal is not supported by the seed sweep. At 123M across three seeds, N=1 and N=3 were statistically indistinguishable. The N-phase sweep found that three phases are not uniquely optimal.