Concept information
Preferred term
self-attention layer
Definition
- A layer in the architecture of transformer-based models that allows the model to focus on different parts of the input sequence when processing each token to capture contextual relationships and dependencies within the input sequence.
Broader concept
Example
- Each encoder has its own self-attention layer and feed-forward layer to process each input separately. (Shin & Lee, 2018)
- First encoder self-attention layers benefit most from additive window attention while decoder self-attention layers prefer multiplicative attention. (Nguyen, Nguyen, Joty & Li, 2020)
- Specifically we distill the knowledge from the hidden state of each transformer block and the attention score of each self-attention layer. (Li, Gao, Lei & Xu, 2023)
- We are motivated to improve the self-attention layer appended to the top of the transformer encoder to enrich the contextualized word representation with information from its neighbors and the relations from the dependency parse trees. (Galitsky, Ilvovsky & Goncharova, 2021)
- We then apply a self-attention layer to model the guiding effect of ontology knowledge on the extraction of entities and relations from the sentence. (Xiong, Chen, Yunfei & Shengyang, 2023)
In other languages
-
French
URI
http://data.loterre.fr/ark:/67375/8LP-DMDVS16W-4
{{label}}
{{#each values }} {{! loop through ConceptPropertyValue objects }}
{{#if prefLabel }}
{{/if}}
{{/each}}
{{#if notation }}{{ notation }} {{/if}}{{ prefLabel }}
{{#ifDifferentLabelLang lang }} ({{ lang }}){{/ifDifferentLabelLang}}
{{#if vocabName }}
{{ vocabName }}
{{/if}}