Concept information
Preferred term
tokenization
Definition
- The task/process of recognizing and tagging tokens (words, punctuation marks, digits etc.) in a text.
Broader concept
Scope note
- The task/process of recognizing and tagging tokens (words, punctuation marks, digits etc.) in a text
executes
- ABNER
- AFNER
- AllenNLP
- Apache cTAKES
- Apache OpenNLP
- Bluima
- CleanNLP
- ClearTK
- CogComp-NLP
- corpustools
- DISCO Builder
- DKPro Core
- fastText
- FreeLing
- frog
- GATE
- Gensim
- Heart of Gold
- ILSP NLP
- JTextPro
- koRpus
- LAPPS Grid
- LibShortText
- MALLET
- MorphAdorner
- NERSuite
- NLP4J
- NLPCube
- NLP.js
- NLTK
- OpeNER
- OpenNLP
- Penelope
- Polyglot
- PyNLPI
- quanteda
- Rasp
- RWeka
- scikit-learn
- ScispaCy
- sentencepiece
- SpaCy
- spacyr
- Spark NLP
- Spark NLP Python
- Spark NLP Scala
- Stanbol
- Stanford CoreNLP
- Stanza
- Talismane
- tall
- Termsuite
- text2vec
- textacy
- TextAnalysis.jl
- TextBlob
- TextFlows
- Texthero
- TextRazor
- textrecipes
- TextTinyR
- tidytext
- tm
- tokenizers
- tokenizers.bpe
- Tweet NLP
- U-compare
- udpipe
- UDPipe
- Weblicht
- WordTokenizers.jl
- YouTokenToMe
In other languages
-
French
-
segmentation en unités
URI
http://data.loterre.fr/ark:/67375/LTK-T5RXB3DL-2
{{label}}
{{#each values }} {{! loop through ConceptPropertyValue objects }}
{{#if prefLabel }}
{{/if}}
{{/each}}
{{#if notation }}{{ notation }} {{/if}}{{ prefLabel }}
{{#ifDifferentLabelLang lang }} ({{ lang }}){{/ifDifferentLabelLang}}
{{#if vocabName }}
{{ vocabName }}
{{/if}}