Skip to main

Vocabulary of natural language processing

Search from vocabulary

Concept information

Término preferido

tokenization  

Definición

  • The task/process of recognizing and tagging tokens (words, punctuation marks, digits etc.) in a text. (Loterre)

Concepto genérico

Etiquetas alternativas

  • text segmentation
  • tokenisation

Contexto(s) definitorio(s)

  • Text segmentation aims to uncover latent structure by dividing text from a document into coherent sections. (Barrow, Jain, Morariu, Manjunatha, Oard & Resnik, 2020)

Ejemplo

  • Other steps during tokenization included proper handling of special text emoticons such as "o.O". (Chapman, Bernhard & Klakow, 2020)
  • To more thoroughly evaluate our tokenization we train multilingual T5 models using Sentence-Piece and CompoundPiece. (Minixhofer, Pfeiffer & Vulic, 2023)

En otras lenguas

  • francés

  • découpage de texte
  • segmentation de texte

URI

http://data.loterre.fr/ark:/67375/8LP-T7Q0JFBM-5

Descargue este concepto:

RDF/XML TURTLE JSON-LD última modificación 27/5/24