Hierarchical Advantage

Hierarchical advantage assigns sibling contrast at the node level using normalized backed-up values. For a trajectory, hierarchical advantage sums discounted sibling contrasts along the path with alpha equal to 0.7. The final advantage com…

1 sources - 4 claims

Hierarchical advantage assigns sibling contrast at the node level using normalized backed-up values. For a trajectory, hierarchical advantage sums discounted sibling contrasts along the path with alpha equal to 0.7. The final advantage combines GRPO advantage and hierarchical advantage with lambda equal to 0.5. The InfoTree workflow computes GRPO plus hierarchical advantage before the policy update.