« KV Cache » : différence entre les versions

Version du 28 octobre 2025 à 10:21

en construction

Définition

XXXXXXXXX

Français

XXXXXXXXX

Anglais

KV Cache

a KV cache stores intermediate key (K) and value (V) computations for reuse during inference (after training), which results in a substantial speed-up when generating text. The downside of a KV cache is that it adds more complexity to the code, increases memory requirements (the main reason I initially didn't include it in the book), and can't be used during training. However, the inference speed-ups are often well worth the trade-offs in code complexity and memory when using LLMs in production.
 KV cache is a technique used in transformer models to improve inference efficiency by storing key (K) and value (V) states of previously computed tokens. This allows the model to avoid redundant computations during the generation of new tokens, thereby reducing the time and resources required for inference

Sources

Source : Arxiv

Source : huggingface

Source : The Large Language Model Playbook

@@ Ligne 16 : / Ligne 16 : @@
-== Source ==
+== Sources ==
+[https://arxiv.org/html/2407.18003v1   Source : Arxiv]
 [https://huggingface.co/blog/not-lain/kv-caching   Source : huggingface]
+[https://cyrilzakka.github.io/llm-playbook/nested/kv-cache.html   Source : The Large Language Model Playbook]
 [[Catégorie:vocabulary]]

« KV Cache » : différence entre les versions

Version du 28 octobre 2025 à 10:21

en construction

Définition

Français

Anglais

Sources

« KV Cache » : différence entre les versions