SM Clock Locking
At 780 MHz and sequence length 1024, every tested architecture saved substantial power with less than 1% throughput loss. The extra frequency from 1590 MHz to the effective 1830 MHz region increased power without improving decode throughpu…
1 sources - 5 claims
At 780 MHz and sequence length 1024, every tested architecture saved substantial power with less than 1% throughput loss. The extra frequency from 1590 MHz to the effective 1830 MHz region increased power without improving decode throughput. SM clock locking reduces unused compute-core power during memory-paced decode while preserving HBM speed. The tested H200 did not sustain a requested 1980 MHz lock, instead clamping near 1830 MHz. The H200 idle power floor limits how much underclocking can reduce total GPU power.