N-Phase Scaling

At 5.5M scale, N=1 achieved the best perplexity among tested phase counts. The stronger claim that three phases are optimal is not supported by the seed sweep. At 123M across three seeds, N=1 and N=3 were statistically indistinguishable. T…

1 sources - 4 claims

At 5.5M scale, N=1 achieved the best perplexity among tested phase counts. The stronger claim that three phases are optimal is not supported by the seed sweep. At 123M across three seeds, N=1 and N=3 were statistically indistinguishable. The N-phase sweep found that three phases are not uniquely optimal.