Prefix Edit Distance
Prefix edit distance at K = 15 achieved Spearman rho 0.419 and AUROC 0.77. Mid-rollout divergence predicted final reward variance best around K = 10 to K = 15. A dK value of 0 means all action prefixes are identical, while a value near 1 m…
1 sources - 5 claims
Prefix edit distance at K = 15 achieved Spearman rho 0.419 and AUROC 0.77. Mid-rollout divergence predicted final reward variance best around K = 10 to K = 15. A dK value of 0 means all action prefixes are identical, while a value near 1 means they share almost no actions. At step K, the gate computes mean pairwise prefix edit distance across all trajectory pairs in a group. Each pairwise prefix distance is a normalized Levenshtein distance between action prefixes.