Passer au contenu principal

Vocabulary of natural language processing

Choisissez le vocabulaire dans lequel chercher

Concept information

Terme préférentiel

attention distribution  

Définition

  • The distribution of attention weights across the input sequence in an attention-based model.

Concept générique

Exemple

  • As a result the ideal attention distribution should put all of the probability mass on the antecedent noun phrase for reflexive anaphora or on the sub-ject noun phrase for agreement and zero on the distractor noun phrases. (Lin, Tan & Frank, 2019)
  • Further analysis indicates that WID can also learn the attention patterns from the teacher model without any alignment loss on attention distributions. (Wu, Hou, Lao, Li, Wong, Zhao & Yang, 2024)
  • Recent research indicates that complementary attention distributions can lead to the same model prediction (Jain and Wallace 2019; Wiegreffe and Pinter 2019) and that the removal of input tokens with large attention weights often does not lead to a change in the model's prediction (Serrano and Smith 2019). (Hollenstein & Beinborn, 2021)
  • Recent research in language processing finds that attention weights are not a good proxy for relative importance because different attention distributions can lead to the same predictions (Jain and Wallace 2019). (Hollenstein & Beinborn, 2021)
  • The proposed method aims to unravel the attention distribution at each layer within a multi-layer model. (Jang, Byun & Shin, 2024)

Traductions

URI

http://data.loterre.fr/ark:/67375/8LP-C01T3CNT-T

Télécharger ce concept :

RDF/XML TURTLE JSON-LD Dernière modification le 27/05/2024