Skip to main content

Vocabulary of natural language processing

Search from vocabulary

Concept information

Preferred term

text clustering  

Definition

  • A type of unsupervised learning that consists in grouping texts according to their level of similarity. (Loterre)

Broader concept

Synonym(s)

  • document clustering

Definitional context(s)

  • Text document clustering is the grouping of text documents into semantically related groups or as Hayes puts it "they are grouped because they are likely to be wanted together" (Hayes 1963). (Sedding & Kazakov, 2004)

Example

  • Initially document clustering was developed to improve precision and recall of information retrieval systems. (Sedding & Kazakov, 2004)
  • Moreover the quality of text clustering is intricately related to user preference which is hard to describe using a textual prompt. (Zhang, Zou, Yi & Aw, 2024)
  • Text document clustering can greatly simplify browsing large collections of documents by reorganizing them into a smaller number of manageable clusters. (Sedding & Kazakov, 2004)
  • They conducted clinical document clustering in 17 different clinical domains and showed that note types in a broad clinical scope form the same cluster but note types in a narrow clinical extent form different clusters. (Sohn, Clark, Halgrim, Murphy, Jonnalagadda, Wagholikar, Wu, Chute & Liu, 2013)
  • Toda and Kataoka (2005) use document clustering based on Named Entities to tackle the problem of document retrieval for search results. (Tsekouras, Petasis & Kosmopoulos, 2019)

In other languages

URI

http://data.loterre.fr/ark:/67375/8LP-DK7HGDD5-3

Download this concept:

RDF/XML TURTLE JSON-LD Last modified 5/21/24