OmniVideoBench


Révision datée du 27 octobre 2025 à 19:56 par Pitpitt (discussion | contributions) (Page créée avec « == EN CONSTRUCTION == == Définition == xxxxx == Français == '''OmniVideoBench ''' == Anglais == '''OmniVideoBench''' A comprehensive benchmark designed to evaluate how well multimodal large language models (MLLMs) can understand and reason across both audio and visual information in videos. The benchmark addresses a critical gap in current evaluation methods, which often focus on single modalities or fail to properly integrate audio-visual reasoning in a l... »)
(diff) ← Version précédente | Voir la version actuelle (diff) | Version suivante → (diff)

EN CONSTRUCTION

Définition

xxxxx

Français

OmniVideoBench

Anglais

OmniVideoBench

A comprehensive benchmark designed to evaluate how well multimodal large language models (MLLMs) can understand and reason across both audio and visual information in videos. The benchmark addresses a critical gap in current evaluation methods, which often focus on single modalities or fail to properly integrate audio-visual reasoning in a logically consistent manner.

OmniVideoBench is a comprehensive benchmark for evaluating audio-visual reasoning in multimodal large language models, addressing modality complementarity and logical consistency.

Sources

Sources : huggingface

Contributeurs: wiki