« Attention clairsemée » : différence entre les versions

Version du 31 mars 2026 à 15:23

Définition

xxxxx

Français

attention clairsemée

attention parcimonieuse

attention creuse

attention clairsemée native

Anglais

sparse attention

native sparse attention

Sources

Aroosa Hameed (2023) - attention clairsemée

Wikipedia - attention clairsemée

DeepSeek - sparse attention

@@ Ligne 1 : / Ligne 1 : @@
-== EN CONSTRUCTION ==
 == Définition ==
 xxxxx
 == Français ==
-'''xxxxx '''
+'''attention clairsemée'''
-== Anglais ==
+'''attention parcimonieuse'''
-'''Native Sparse Attention'''
-'''DSA'''
+'''attention creuse'''
- '''Hardware-Aligned and Natively Trainable Sparse Attention'''
+'''attention clairsemée native'''
- Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges. Sparse attention offers a promising direction for improving efficiency while maintaining model capabilities. We present NSA, a Natively trainable Sparse Attention mechanism that integrates algorithmic innovations with hardware-aligned optimizations to achieve efficient long-context modeling. NSA employs a dynamic hierarchical sparse strategy, combining coarse-grained token compression with fine-grained token selection to preserve both global context awareness and local precision. Our approach advances sparse attention design with two key innovations: (1) We achieve substantial speedups through arithmetic intensity-balanced algorithm design, with implementation optimizations for modern hardware. (2) We enable end-to-end training, reducing pretraining computation without sacrificing model performance.
+== Anglais ==
+'''sparse attention'''
+'''native sparse attention'''
+<!-- Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges. Sparse attention offers a promising direction for improving efficiency while maintaining model capabilities. We present NSA, a Natively trainable Sparse Attention mechanism that integrates algorithmic innovations with hardware-aligned optimizations to achieve efficient long-context modeling. NSA employs a dynamic hierarchical sparse strategy, combining coarse-grained token compression with fine-grained token selection to preserve both global context awareness and local precision. Our approach advances sparse attention design with two key innovations: (1) We achieve substantial speedups through arithmetic intensity-balanced algorithm design, with implementation optimizations for modern hardware. (2) We enable end-to-end training, reducing pretraining computation without sacrificing model performance.-->
+==Sources==
+[https://espace.etsmtl.ca/id/eprint/3299/ Aroosa Hameed (2023) - attention clairsemée ]
-==Sources==
+[https://fr.wikipedia.org/wiki/Attention_(apprentissage_automatique) Wikipedia - attention clairsemée]
-[https://arxiv.org/abs/2502.11089     Sources :  arxiv]
+[https://aarnphm.xyz/thoughts/papers/DeepSeek_V3_2.pdf  DeepSeek - sparse attention]
-[[Catégorie:vocabulary]]
+[[Catégorie:Publication]]

« Attention clairsemée » : différence entre les versions

Version du 31 mars 2026 à 15:23

Définition

Français

Anglais

Sources

« Attention clairsemée » : différence entre les versions