Skip to main content

Vocabulary of natural language processing

Search from vocabulary

Concept information

Preferred term

self-attention layer  

Definition

  • A layer in the architecture of transformer-based models that allows the model to focus on different parts of the input sequence when processing each token to capture contextual relationships and dependencies within the input sequence.

Broader concept

Example

  • Each encoder has its own self-attention layer and feed-forward layer to process each input separately. (Shin & Lee, 2018)
  • First encoder self-attention layers benefit most from additive window attention while decoder self-attention layers prefer multiplicative attention. (Nguyen, Nguyen, Joty & Li, 2020)
  • Specifically we distill the knowledge from the hidden state of each transformer block and the attention score of each self-attention layer. (Li, Gao, Lei & Xu, 2023)
  • We are motivated to improve the self-attention layer appended to the top of the transformer encoder to enrich the contextualized word representation with information from its neighbors and the relations from the dependency parse trees. (Galitsky, Ilvovsky & Goncharova, 2021)
  • We then apply a self-attention layer to model the guiding effect of ontology knowledge on the extraction of entities and relations from the sentence. (Xiong, Chen, Yunfei & Shengyang, 2023)

In other languages

URI

http://data.loterre.fr/ark:/67375/8LP-DMDVS16W-4

Download this concept:

RDF/XML TURTLE JSON-LD Last modified 5/13/24