Clock-State Q-Learning

1 sources - 7 claims

Success rates for the clock-state agent are at or above 90% in all environments, approaching 100% in denser plume conditions. The clock-state Q-learning agent substantially outperforms the optimized cast-and-surge heuristic in all four sparsity environments. The best single-Q agent across an ensemble of 20 training runs approaches the performance of the quasi-optimal Bayesian POMDP agent despite using far simpler memory. High policy variability across 20 training runs reflects a broad manifold of near-equivalent local optima rather than training instability. The clock-state Q-learning agent uses only elapsed blank time as its internal state, discarding all other odor history. A scalar clock state resets to zero upon odor detection and increments by one on each blank step, with the Q-matrix encoding a fixed deterministic sequence of moves in response to plume loss. The hypothesis driving this study is that plume recovery is the dominant challenge in turbulent olfactory search and that elapsed blank time captures the most relevant information for solving it.