Batch Size
Increasing batch size from 1 to 32 reduced energy per token by more than 20x through weight-loading amortization. Batch size affected energy per token more than DVFS or architecture choice in the experiments. At batch size 32 and sequence…
1 sources - 4 claims
Increasing batch size from 1 to 32 reduced energy per token by more than 20x through weight-loading amortization. Batch size affected energy per token more than DVFS or architecture choice in the experiments. At batch size 32 and sequence length 4096, optimal clocks and energy savings varied materially by architecture. The batch-size sweep is used to argue that even high request concurrency does not make power caps effective for decode on the tested GPU.