« ThinkAct » : différence entre les versions


(Page créée avec « ==en construction== == Définition == XXXXXXXXX == Français == ''' ThinkAct''' == Anglais == '''ThinkAct''' A dual-system framework, uses reinforced visual latent planning to enable high-level reasoning and robust action execution in vision-language-action tasks. A framework that enables robots to "think before acting" by combining high-level reasoning with low-level action execution. The approach addresses a key limitation in current vision-language-act... »)
 
Aucun résumé des modifications
Ligne 2 : Ligne 2 :


== Définition ==
== Définition ==
XXXXXXXXX
Architecture de raisonnement ''vision-langage-action'' ou VLA qui entraîne un '''[[grand modèle de langues multimodal]]''' à générer des plans de raisonnement guidés par des récompenses visuelles alignées.


== Français ==
== Français ==
Ligne 10 : Ligne 10 :
'''ThinkAct'''
'''ThinkAct'''


A dual-system framework, uses reinforced visual latent planning to enable high-level reasoning and robust action execution in vision-language-action tasks.
''A reasoning vision-language-action framework that trains a multimodal large language model to generate embodied reasoning plans guided by reinforcing action-aligned visual rewards based on goal completion and trajectory consistency.''
A framework that enables robots to "think before acting" by combining high-level reasoning with low-level action execution. The approach addresses a key limitation in current vision-language-action models that directly map inputs to actions without explicit planning, making them struggle with complex, multi-step tasks. ThinkAct uses reinforcement learning to train multimodal language models to generate reasoning plans that guide downstream action execution.


== Source ==
== Sources ==
[https://jasper0314-huang.github.io/thinkact-vla/  Source : GitHub.io]


[https://huggingface.co/papers/2507.16815l  Source : huggingface]
[https://huggingface.co/papers/2507.16815l  Source : huggingface]


[[Catégorie:vocabulary]]
[[Catégorie:vocabulary]]

Version du 12 octobre 2025 à 11:51

en construction

Définition

Architecture de raisonnement vision-langage-action ou VLA qui entraîne un grand modèle de langues multimodal à générer des plans de raisonnement guidés par des récompenses visuelles alignées.

Français

ThinkAct

Anglais

ThinkAct

A reasoning vision-language-action framework that trains a multimodal large language model to generate embodied reasoning plans guided by reinforcing action-aligned visual rewards based on goal completion and trajectory consistency.

Sources

Source : GitHub.io

Source : huggingface

Contributeurs: Arianne Arel, wiki