Concept information
Término preferido
tokenization
Definición
- The task/process of recognizing and tagging tokens (words, punctuation marks, digits etc.) in a text. (Loterre)
Concepto genérico
Etiquetas alternativas
- text segmentation
- tokenisation
Contexto(s) definitorio(s)
- Text segmentation aims to uncover latent structure by dividing text from a document into coherent sections. (Barrow, Jain, Morariu, Manjunatha, Oard & Resnik, 2020)
Ejemplo
- Other steps during tokenization included proper handling of special text emoticons such as "o.O". (Chapman, Bernhard & Klakow, 2020)
- To more thoroughly evaluate our tokenization we train multilingual T5 models using Sentence-Piece and CompoundPiece. (Minixhofer, Pfeiffer & Vulic, 2023)
En otras lenguas
-
francés
-
découpage de texte
-
segmentation de texte
URI
http://data.loterre.fr/ark:/67375/8LP-T7Q0JFBM-5
{{label}}
{{#each values }} {{! loop through ConceptPropertyValue objects }}
{{#if prefLabel }}
{{/if}}
{{/each}}
{{#if notation }}{{ notation }} {{/if}}{{ prefLabel }}
{{#ifDifferentLabelLang lang }} ({{ lang }}){{/ifDifferentLabelLang}}
{{#if vocabName }}
{{ vocabName }}
{{/if}}