Neyman Allocation

1 sources - 5 claims

The realized chunk allocation during training closely matches the square-root-V_c-weighted prediction from Theorem 1, empirically confirming that C_c tracks V_c in practice. C_c, the success-failure action variance, serves as a computable lower-bound proxy for V_c that preserves the relative ordering of phases and requires no auxiliary model. The allocation that minimizes gradient estimator variance under stratified sampling is the Neyman allocation, where phases are sampled proportionally to N_c times the square root of V_c. Keep probabilities are updated every five steps using a minimum floor of 0.1 to prevent any phase from being permanently excluded due to a transient zero estimate. The cumulative C_c curve has a knee at approximately 20% of trajectory chunks, providing a principled heuristic for choosing the budget B in any new domain.