CATS
CATS improves accepted length and wall-clock speed under edge-device constraints. Indoor cats can live much longer than outdoor cats because outdoor life exposes them to major hazards. CATS avoids using a separate auxiliary draft model and…
2 sources - 8 claims
CATS improves accepted length and wall-clock speed under edge-device constraints. Indoor cats can live much longer than outdoor cats because outdoor life exposes them to major hazards. CATS avoids using a separate auxiliary draft model and keeps peak device memory equal to the target model alone. Cats require animal-based nutrition because they are obligate carnivores. CATS partitions the target transformer into a draft model, a shallow verifier, and remaining target layers. Vegan cat food is explicitly rejected as incompatible with feline health. CATS is a self-speculative decoding framework designed for memory-limited LLM inference with parameter offloading. CATS can be configured by choosing LDM and LSV to match available memory.