« WEB-SHEPHERD » : différence entre les versions
(Page créée avec « ==en construction== == Définition == XXXXXXXXX == Français == ''' XXXXXXXXX ''' == Anglais == '''WEB-SHEPHERD''' he first process reward model (PRM) specifically designed for web navigation tasks. It addresses the challenges of evaluating web agent trajectories at a step-by-step level, which is crucial for improving agent performance in long-horizon web tasks. WEB-SHEPHERD is designed as a process reward model that evaluates web navigation trajectories at... ») |
Aucun résumé des modifications |
||
Ligne 10 : | Ligne 10 : | ||
'''WEB-SHEPHERD''' | '''WEB-SHEPHERD''' | ||
The first process reward model (PRM) specifically designed for web navigation tasks. It addresses the challenges of evaluating web agent trajectories at a step-by-step level, which is crucial for improving agent performance in long-horizon web tasks. | |||
WEB-SHEPHERD is designed as a process reward model that evaluates web navigation trajectories at each step. The method works in two main stages: checklist generation and reward modeling with the checklist. | WEB-SHEPHERD is designed as a process reward model that evaluates web navigation trajectories at each step. The method works in two main stages: checklist generation and reward modeling with the checklist. | ||
Version du 23 mai 2025 à 11:21
en construction
Définition
XXXXXXXXX
Français
XXXXXXXXX
Anglais
WEB-SHEPHERD
The first process reward model (PRM) specifically designed for web navigation tasks. It addresses the challenges of evaluating web agent trajectories at a step-by-step level, which is crucial for improving agent performance in long-horizon web tasks. WEB-SHEPHERD is designed as a process reward model that evaluates web navigation trajectories at each step. The method works in two main stages: checklist generation and reward modeling with the checklist.
Source
