Accuracy Preservation
Top-1 predictions match exactly, with maximum absolute logit differences below 0.007 for all tested pruning methods. At 50% pruning, throughput reaches 5,187 images per second with 78.7% top-1 accuracy. The Triton backend gives higher thro…
1 sources - 5 claims
Top-1 predictions match exactly, with maximum absolute logit differences below 0.007 for all tested pruning methods. At 50% pruning, throughput reaches 5,187 images per second with 78.7% top-1 accuracy. The Triton backend gives higher throughput than padded PyTorch at the same accuracy on ImageNet-1K validation for DeiT-Small. At 25% Threshold-L2 pruning, the pipeline reaches 81.7% top-1 accuracy and exceeds unpruned DeiT-S throughput while losing 0.5 percentage points of accuracy. Numerical equivalence against padded PyTorch SDPA is verified across all ImageNet validation samples.