« Encodage par paires d'octets » : différence entre les versions

Version du 15 novembre 2024 à 11:40

en construction

Définition

XXXXXXX

Voir aussi segment, traitement automatique de la langue naturelle et Vocabulary (NLP)

Français

XXXXXXX

Anglais

Byte Pair Encoding

BPE

Byte Pair Encoding is a simple form of data compression algorithms and is one of the most widely used subword-tokenization algorithms. It replaces the most frequent pair of bytes of data with a new byte that was not contained int the initial dataset. In Natural Language Processing, BPE is used to represent large vocabulary with a small set of subword units and most common words are represented in the vocabulary as a single token.

It is used in all of GPT versions, RoBERTa, XML, FlauBERT and more.

Source

Source : Geeks for Geeks

Source : Medium

Source : Wikipedia

@@ Ligne 4 : / Ligne 4 : @@
 XXXXXXX
-Voir aussi '''[[traitement automatique de la langue naturelle]]'''
+Voir aussi '''[[segment]]''', '''[[traitement automatique de la langue naturelle]]''' et '''[[Vocabulary (NLP)]]'''
 == Français ==
@@ Ligne 14 : / Ligne 14 : @@
 ''' BPE'''
-''BPE is a simple form of data compression algorithm in which the most common pair of consecutive bytes of data is replaced with a byte that does not occur in that data''
+''Byte Pair Encoding is a simple form of data compression algorithms and is one of the most widely used subword-tokenization algorithms. It replaces the most frequent pair of bytes of data with a new byte that was not contained int the initial dataset. In Natural Language Processing, BPE is used to represent large vocabulary with a small set of subword units and most common words are represented in the vocabulary as a single token.''
+''It is used in all of GPT versions, RoBERTa, XML, FlauBERT and more.''
 == Source ==

« Encodage par paires d'octets » : différence entre les versions