Two-Q Architecture

The heuristic 2Qh variant, assembled without retraining by combining a dense-trained and a sparse-trained single-Q agent, matches or outperforms the best single-Q agent in most conditions. The two-Q agent selects between two Q-matrices bas…

1 sources - 5 claims

The heuristic 2Qh variant, assembled without retraining by combining a dense-trained and a sparse-trained single-Q agent, matches or outperforms the best single-Q agent in most conditions. The two-Q agent selects between two Q-matrices based on whether the duration of the most recently completed blank interval exceeds a threshold, encoding a coarse local-density classifier. Q-plus specializes in escaping sparse rear regions by performing far more upwind search before returning downwind. Q-minus specializes in preventing overshooting by initiating downwind return much sooner and exhibiting a more prominent initial surge. The performance benefit of the two-Q architecture arises from the functional complementarity of the two programs, not from jointly optimized training.