Q240 : Automatic multi-level labeling of text clusters
Thesis > Central Library of Shahrood University > Computer Engineering > MSc > 2023
Authors:
Najme Gholami [Author], Hoda Mashayekhi[Supervisor], [Advisor]
Abstarct: With the daily increase in the volume of text data, the use of appropriate models and methods for improving the process of knowledge extraction has become particularly important in text mining and natural language processing. Cluster labeling has been used to discover the hidden semantic structures of cluster words and textual data. Automatic labeling is an algorithmic process to generate/select phrases or sentences or words that best describe a cluster. Actually, the task of automatic labeling of clusters is to produce a short label that summarizes the concept of the cluster, with the goal of producing a simple and fluid label for end users. Automatic labeling has advantages for users whose goal is to analyze and understand sets of documents, also for search engines whose purpose is to find associations between groups of words and topics. However, the research carried out to date creates single-level labels for clusters. Which are labels with a general concept and do not cover details. The aim of this research is to obtain multilevel labels for text clusters that show the complete meaning of the cluster to the user while at the same time higher level labels show the general concepts and lower-level labels display the details of the cluster. According to this goal, a method has been proposed that consists of four steps, namely text preprocessing, document clustering, first level labeling, and second level labeling. Finally, using the WMRR 30 evaluation criterion, it was found that the proposed automatic labeling method has an improvement of 0.089 compared to the HLDA method and an improvement of 0.029 compared to the LDA method.
Keywords:
#Text analysis #cluster descxription #multi-level labeling #cluster labeling #text clustering Keeping place: Central Library of Shahrood University
Visitor: