Hierarchical Advantage
Hierarchical advantage assigns sibling contrast at the node level using normalized backed-up values. For a trajectory, hierarchical advantage sums discounted sibling contrasts along the path with alpha equal to 0.7. The final advantage com…
1 sources - 4 claims
Hierarchical advantage assigns sibling contrast at the node level using normalized backed-up values. For a trajectory, hierarchical advantage sums discounted sibling contrasts along the path with alpha equal to 0.7. The final advantage combines GRPO advantage and hierarchical advantage with lambda equal to 0.5. The InfoTree workflow computes GRPO plus hierarchical advantage before the policy update.