LongVILA


Révision datée du 20 septembre 2025 à 10:35 par Pitpitt (discussion | contributions)
(diff) ← Version précédente | Voir la version actuelle (diff) | Version suivante → (diff)

en construction

Définition

XXXXXXXXX

Français

LongVILA

Anglais

LongVILA

A comprehensive framework that enables vision-language models to perform complex reasoning on long videos using reinforcement learning. The work addresses the significant challenge of understanding hour-long videos that require temporal, spatial, goal-oriented, and narrative reasoning capabilities. 

A framework for scaling vision-language models to long videos using reinforcement learning, achieving strong performance on various reasoning tasks with a specialized training infrastructure.

Source

Source :huggingface


[[Catégorie:vocabulary]

Contributeurs: wiki