: NLP Vocabulary: attention distribution

... > neural networks model > neuron > layer > attention mechanism > attention distribution

attention distribution

The distribution of attention weights across the input sequence in an attention-based model.

As a result the ideal attention distribution should put all of the probability mass on the antecedent noun phrase for reflexive anaphora or on the sub-ject noun phrase for agreement and zero on the distractor noun phrases. (Lin, Tan & Frank, 2019)
Further analysis indicates that WID can also learn the attention patterns from the teacher model without any alignment loss on attention distributions. (Wu, Hou, Lao, Li, Wong, Zhao & Yang, 2024)
Recent research indicates that complementary attention distributions can lead to the same model prediction (Jain and Wallace 2019; Wiegreffe and Pinter 2019) and that the removal of input tokens with large attention weights often does not lead to a change in the model's prediction (Serrano and Smith 2019). (Hollenstein & Beinborn, 2021)
Recent research in language processing finds that attention weights are not a good proxy for relative importance because different attention distributions can lead to the same predictions (Jain and Wallace 2019). (Hollenstein & Beinborn, 2021)
The proposed method aims to unravel the attention distribution at each layer within a multi-layer model. (Jang, Byun & Shin, 2024)

http://data.loterre.fr/ark:/67375/8LP-C01T3CNT-T

RDF/XML TURTLE JSON-LD Last modified 5/27/24

Loterre