KV Cache


en construction

Définition

XXXXXXXXX

Français

XXXXXXXXX

Anglais

KV Cache

a KV cache stores intermediate key (K) and value (V) computations for reuse during inference (after training), which results in a substantial speed-up when generating text. The downside of a KV cache is that it adds more complexity to the code, increases memory requirements (the main reason I initially didn't include it in the book), and can't be used during training. However, the inference speed-ups are often well worth the trade-offs in code complexity and memory when using LLMs in production.
 KV cache is a technique used in transformer models to improve inference efficiency by storing key (K) and value (V) states of previously computed tokens. This allows the model to avoid redundant computations during the generation of new tokens, thereby reducing the time and resources required for inference



Source

Source : huggingface

Contributeurs: wiki