« LongVILA » : différence entre les versions
|  (Page créée avec « ==en construction==   == Définition == XXXXXXXXX  == Français == ''' LongVILA'''  == Anglais == '''LongVILA'''   A comprehensive framework that enables vision-language models to perform complex reasoning on long videos using reinforcement learning. The work addresses the significant challenge of understanding hour-long videos that require temporal, spatial, goal-oriented, and narrative reasoning capabilities.     A framework for scaling vision-language models... ») | Aucun résumé des modifications | ||
| Ligne 20 : | Ligne 20 : | ||
| [[Catégorie: | [[Catégorie:vocabulary] | ||
Dernière version du 20 septembre 2025 à 10:35
en construction
Définition
XXXXXXXXX
Français
LongVILA
Anglais
LongVILA
A comprehensive framework that enables vision-language models to perform complex reasoning on long videos using reinforcement learning. The work addresses the significant challenge of understanding hour-long videos that require temporal, spatial, goal-oriented, and narrative reasoning capabilities. A framework for scaling vision-language models to long videos using reinforcement learning, achieving strong performance on various reasoning tasks with a specialized training infrastructure.
Source
[[Catégorie:vocabulary]
Contributeurs: wiki
 
		
		 
	


 
 

 
 

 
  
 