« MiniMax-M1 » : différence entre les versions

Dernière version du 5 août 2025 à 09:04

Définition

Il s'agit d'un modèle de raisonnement hybride à grande échelle et à open-weight, alimenté par une architecture hybride de mixture d'experts combinée à un mécanisme d'attention éclair. Il est entraîné à l'aide de l'apprentissage par renforcement à grande échelle sur divers problèmes. MiniMax-M1 est particulièrement adapté aux tâches complexes qui nécessitent le traitement de données volumineuses et une réflexion approfondie.

Voir aussi mécanisme d'attention

Compléments

Attention: MiniMax-M1 n'est pas en lien avec l'algorithme minimax.

Français

MiniMax-M1

Anglais

MiniMax-M1

Sources

Source : arxiv

Source : huggingface

Source : MiniMax-M1

@@ Ligne 1 : / Ligne 1 : @@
-==en construction==
+== Définition ==
+Il s'agit d'un modèle de '''[[raisonnement]]''' hybride à grande échelle et à ''open-weight'', alimenté par une architecture hybride de '''[[mixture d'experts]]''' combinée à un '''mécanisme d'attention éclair'''. Il est entraîné à l'aide de l''''[[apprentissage par renforcement]]''' à grande échelle sur divers problèmes. MiniMax-M1 est particulièrement adapté aux tâches complexes qui nécessitent le traitement de données volumineuses et une réflexion approfondie.
-== Définition ==
+Voir aussi '''[[mécanisme d'attention]]'''
+== Compléments ==
+Attention: MiniMax-M1 n'est pas en lien avec l''''[[algorithme minimax]]'''.
 == Français ==
@@ Ligne 9 : / Ligne 12 : @@
 == Anglais ==
 '''MiniMax-M1'''
+<!--It is an open-weight, large-scale hybrid-attention reasoning model, powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism. It is trained using large-scale reinforcement learning (RL) on diverse problems. M1 is particularly suitable for complex tasks that require processing long inputs and extensive thinking.-->
+== Sources ==
+[https://arxiv.org/abs/2506.13585   Source : arxiv]
- MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism. The model is developed based on our previous MiniMax-Text-01 model, which contains a total of 456 billion parameters with 45.9 billion parameters activated per token. Consistent with MiniMax-Text-01, the M1 model natively supports a context length of 1 million tokens, 8x the context size of DeepSeek R1. Furthermore, the lightning attention mechanism in MiniMax-M1 enables efficient scaling of test-time compute –
+[https://huggingface.co/MiniMaxAI/MiniMax-M1-80k   Source : huggingface]
-== Source ==
+[https://minimax-m1.com/   Source : MiniMax-M1]
-[https://huggingface.co/MiniMaxAI/MiniMax-M1-80k   Source : huggingface]
+[[Catégorie:GRAND_LEXIQUE_FRANÇAIS]]
-[[Catégorie:vocabulary]]
+[[Catégorie:ENGLISH]]

« MiniMax-M1 » : différence entre les versions