UniVideo


Révision datée du 27 octobre 2025 à 19:58 par Pitpitt (discussion | contributions) (Page créée avec « == EN CONSTRUCTION == == Définition == xxxxx == Français == '''UniVideo''' == Anglais == '''xxxUniVideoxx ''' A unified framework that combines video understanding, generation, and editing capabilities within a single model. Unlike existing approaches that handle these tasks separately, UniVideo can interpret complex multimodal instructions and perform diverse video operations through a dual-stream architecture. The system demonstrates strong performance a... »)
(diff) ← Version précédente | Voir la version actuelle (diff) | Version suivante → (diff)

EN CONSTRUCTION

Définition

xxxxx

Français

UniVideo

Anglais

xxxUniVideoxx

A unified framework that combines video understanding, generation, and editing capabilities within a single model. Unlike existing approaches that handle these tasks separately, UniVideo can interpret complex multimodal instructions and perform diverse video operations through a dual-stream architecture. The system demonstrates strong performance across multiple video tasks while enabling novel capabilities like visual prompt understanding and task composition.

UniVideo, a dual-stream framework combining a Multimodal Large Language Model and a Multimodal DiT, extends unified modeling to video generation and editing, achieving state-of-the-art performance and supporting task composition and generalization.

Sources

Sources : huggingface

Contributeurs: wiki