Best-of-N Strategy

Révision datée du 26 juin 2025 à 10:51 par Arianne (discussion | contributions) (Page créée avec « == en construction == == Définition == xxxxxxx == Français == ''' XXXXXX''' Voir aussi '''reward hacking problem''' == Compléments == '' à faire'' <!--The BoN strategy does not scale with the number of samples N due to the reward hacking problem. Particularly significant in scenarios where the AI model may not have a singularly deterministic output but can benefit from generating a spectrum of possibilities to increase the chance of achieving a higher qua... »)

(diff) ← Version précédente | Voir la version actuelle (diff) | Version suivante → (diff)

en construction

Définition

xxxxxxx

Français

XXXXXX

Voir aussi reward hacking problem

Compléments

à faire

Anglais

Best-of-N Strategy

Best-of-N

BoN

Sources

Source : Envisioning.io

Récupérée de « https://datafranca.org/wiki/index.php?title=Best-of-N_Strategy&oldid=113175 »

Vocabulary

Contributeurs: Arianne Arel