Differential TD

1 sources - 5 claims

Local-clock differential TD is stable for every positive eta under the cited Wan et al. conditions. Differential TD replaces the discounted TD error with an error that subtracts the current average-reward estimate. Differential TD is designed for the average-reward setting and incorporates a centering effect similar to reference-state centering. Global-clock differential TD does not converge for all positive eta. Differential TD was introduced for off-policy average-reward policy evaluation.