Performance Evaluation

On Vicuna-7B greedy chain decoding, CATS reached 3.0598 accepted tokens and 3.18x speedup. CATS with EAGLE tree decoding reached 3.7050 accepted tokens and 3.71x speedup on Vicuna-7B greedy decoding. On LLaMA2-7B greedy chain decoding, CAT…

2 sources - 9 claims

On Vicuna-7B greedy chain decoding, CATS reached 3.0598 accepted tokens and 3.18x speedup. CATS with EAGLE tree decoding reached 3.7050 accepted tokens and 3.71x speedup on Vicuna-7B greedy decoding. On LLaMA2-7B greedy chain decoding, CATS achieved 4.6491 accepted tokens and 4.65x speedup. CATS remains faster across larger and different model families. Under evaluated benchmarks and decoding settings, CATS reported up to 5.08x wall-clock speedup without observed generation quality degradation. Satisfaction and actual performance did not always align in the clustering analysis. Self-perceived satisfaction was not a reliable substitute for objective evaluation. Some team members reported high satisfaction despite low performance. Structured debriefing and objective feedback are proposed to calibrate perceptions against actual outcomes.