« OmniVideoBench » : différence entre les versions


(Page créée avec « == EN CONSTRUCTION == == Définition == xxxxx == Français == '''OmniVideoBench ''' == Anglais == '''OmniVideoBench''' A comprehensive benchmark designed to evaluate how well multimodal large language models (MLLMs) can understand and reason across both audio and visual information in videos. The benchmark addresses a critical gap in current evaluation methods, which often focus on single modalities or fail to properly integrate audio-visual reasoning in a l... »)
 
Aucun résumé des modifications
 
Ligne 10 : Ligne 10 :
'''OmniVideoBench'''
'''OmniVideoBench'''


A comprehensive benchmark designed to evaluate how well multimodal large language models (MLLMs) can understand and reason across both audio and visual information in videos. The benchmark addresses a critical gap in current evaluation methods, which often focus on single modalities or fail to properly integrate audio-visual reasoning in a logically consistent manner.
<!--Comprehensive benchmark for evaluating deep audio-visual reasoning across a wide variety of tasks and modalities in multimodal large language model.-->


OmniVideoBench is a comprehensive benchmark for evaluating audio-visual reasoning in multimodal large language models, addressing modality complementarity and logical consistency.
==Sources==
[https://github.com/NJU-LINK/OmniVideoBench   Source : GitHub]


==Sources==
[https://huggingface.co/papers/2510.10689 Source :  huggingface]
[https://huggingface.co/papers/2510.10689 Sources :  huggingface]
 
[https://omnivideobench.github.io/omnivideobench_home/  Source : OmniVideoBench]




[[Catégorie:vocabulary]]
[[Catégorie:vocabulary]]

Dernière version du 23 février 2026 à 14:07

EN CONSTRUCTION

Définition

xxxxx

Français

OmniVideoBench

Anglais

OmniVideoBench


Sources

Source : GitHub

Source : huggingface

Source : OmniVideoBench

Contributeurs: Arianne Arel, wiki