« Ovis » : différence entre les versions

Version du 6 octobre 2025 à 12:47

Ovis (Open VISion) est une nouvelle architecture de grand modèle de langues multimodal à grande échelle conçue pour aligner structurellement les représentations sémantiques distributionnelles visuelles et textuelles.

Ovis

Ovis

Ovis (Open VISion) is a novel Multimodal Large Language Model (MLLM) architecture designed to structurally align visual and textual embeddings.

@@ Ligne 2 : / Ligne 2 : @@
 == Définition ==
-XXXXXXXXX
+Ovis (Open VISion) est une nouvelle architecture de '''[[grand modèle de langues multimodal]]''' à grande échelle conçue pour aligner structurellement les '''[[Représentation sémantique distributionnelle compacte|représentations sémantiques distributionnelles]]''' visuelles et textuelles.
 == Français ==
-'''Ovis 2,5'''
+'''Ovis'''
 == Anglais ==
-'''Ovis 2,5'''
+'''Ovis'''
- An advanced multimodal large language model designed to process images at their native resolutions while incorporating reasoning capabilities. The model addresses two key limitations in current vision-language systems: the degradation caused by fixed-resolution image processing and the lack of reflective reasoning beyond simple chain-of-thought approaches.
+''Ovis (Open VISion) is a novel Multimodal Large Language Model (MLLM) architecture designed to structurally align visual and textual embeddings.''
- By eliminating the limitations of fixed-resolution image processing and incorporating self-corrective reasoning, Ovis2.5 achieves substantial improvements over previous models while maintaining efficiency through optimized training infrastructure.
-== Source ==
+== Sources ==
+[https://github.com/AIDC-AI/Ovis   Source : GitHub]
 [https://huggingface.co/papers/2508.11737   Source : huggingface]