Meta CLIP


Révision datée du 25 août 2025 à 08:24 par Pitpitt (discussion | contributions) (Page créée avec « ==en construction== == Définition == XXXXXXXXX == Français == '''  Meta CLIP 2''' == Anglais == ''' Meta CLIP 2''' The first recipe for training CLIP models from scratch on worldwide web-scale image-text pairs spanning 300+ languages. The work addresses the challenge of scaling CLIP beyond English-only data while avoiding the "curse of multilinguality" - where multilingual models perform worse on English tasks than their English-only counterparts.  Meta... »)
(diff) ← Version précédente | Voir la version actuelle (diff) | Version suivante → (diff)

en construction

Définition

XXXXXXXXX

Français

 Meta CLIP 2

Anglais

 Meta CLIP 2

The first recipe for training CLIP models from scratch on worldwide web-scale image-text pairs spanning 300+ languages. The work addresses the challenge of scaling CLIP beyond English-only data while avoiding the "curse of multilinguality" - where multilingual models perform worse on English tasks than their English-only counterparts. 

Meta CLIP 2 presents a recipe for training CLIP models on worldwide multilingual data from scratch. The work demonstrates that the curse of multilinguality can be overcome through careful scaling of metadata construction, data curation algorithms, and training frameworks.

MetaCLIP 2, trained on worldwide web-scale image-text pairs, improves zero-shot classification and multilingual benchmarks without system-level confounding factors.

Source

Source : huggingface

Contributeurs: wiki