Concept information
Terme préférentiel
tokenization
Définition
- The task/process of recognizing and tagging tokens (words, punctuation marks, digits etc.) in a text. (Loterre)
Concept générique
Synonyme(s)
- text segmentation
- tokenisation
Contexte(s) définitoire(s)
- Text segmentation aims to uncover latent structure by dividing text from a document into coherent sections. (Barrow, Jain, Morariu, Manjunatha, Oard & Resnik, 2020)
Exemple
- Other steps during tokenization included proper handling of special text emoticons such as "o.O". (Chapman, Bernhard & Klakow, 2020)
- To more thoroughly evaluate our tokenization we train multilingual T5 models using Sentence-Piece and CompoundPiece. (Minixhofer, Pfeiffer & Vulic, 2023)
Traductions
-
français
-
découpage de texte
-
segmentation de texte
URI
http://data.loterre.fr/ark:/67375/8LP-T7Q0JFBM-5
{{label}}
{{#each values }} {{! loop through ConceptPropertyValue objects }}
{{#if prefLabel }}
{{/if}}
{{/each}}
{{#if notation }}{{ notation }} {{/if}}{{ prefLabel }}
{{#ifDifferentLabelLang lang }} ({{ lang }}){{/ifDifferentLabelLang}}
{{#if vocabName }}
{{ vocabName }}
{{/if}}