« KV Cache » : différence entre les versions


Aucun résumé des modifications
Aucun résumé des modifications
 
Ligne 16 : Ligne 16 :




== Source ==
== Sources ==
[https://arxiv.org/html/2407.18003v1  Source : Arxiv]
 
[https://huggingface.co/blog/not-lain/kv-caching  Source : huggingface]
[https://huggingface.co/blog/not-lain/kv-caching  Source : huggingface]
[https://cyrilzakka.github.io/llm-playbook/nested/kv-cache.html  Source : The Large Language Model Playbook]




[[Catégorie:vocabulary]]
[[Catégorie:vocabulary]]

Dernière version du 28 octobre 2025 à 10:21

en construction

Définition

XXXXXXXXX

Français

XXXXXXXXX

Anglais

KV Cache

a KV cache stores intermediate key (K) and value (V) computations for reuse during inference (after training), which results in a substantial speed-up when generating text. The downside of a KV cache is that it adds more complexity to the code, increases memory requirements (the main reason I initially didn't include it in the book), and can't be used during training. However, the inference speed-ups are often well worth the trade-offs in code complexity and memory when using LLMs in production.
 KV cache is a technique used in transformer models to improve inference efficiency by storing key (K) and value (V) states of previously computed tokens. This allows the model to avoid redundant computations during the generation of new tokens, thereby reducing the time and resources required for inference



Sources

Source : Arxiv

Source : huggingface

Source : The Large Language Model Playbook

Contributeurs: Arianne Arel, wiki