Q245 : Automatic cluster labeling for clusters of mixed data
Thesis > Central Library of Shahrood University > Computer Engineering > MSc > 2023
Authors:
Abstarct:
Automatic labeling of clusters is an approach in data analysis that is used to make clustering results interpretable. Data in the real world consists of different types, numerical and nominal data are common types. Previous researches have generally focused on labeling clusters with numerical or nominal data, and labeling clusters with mixed data has been less investigated; Therefore, in this thesis, a frxamework for automatic labeling of clusters with numerical and nominal data is proposed. Generated labels are unique for each cluster and are generated as attribute-value. The label of each cluster is expected to have the highest accuracy and the shortest length. The frxamework presented in this thesis consists of an unsupervised method for data clustering, an algorithm to discretize numerical features, and a supervised approach baxsed on artificial neural networks to select features associated with each cluster. In the proposed frxamework, the use of the hybrid Discretization algorithm leads to the reduction of information loss in the discretization stage, and this makes the label more accurately describe each cluster. The labels generated by this frxamework show extended performance and are derived from real datasets. The average labeling accuracy on the dataset (Iris, Seeds, Glass) is 98.29%. The obtained accuracy is 1.09% and 5.4% higher compared to the average accuracy of previous works (Moura et al. 2022) and (Lopes et al. 2016). Also, compared to the labels produced in previous works, the labels produced in this research can, in most cases, describe the clusters with the least attribute -value.
Keywords:
#Automatic Labeling #Clustering #Mixed Datasets #Numerical Data #Nominal Data #Discretization Keeping place: Central Library of Shahrood University
Visitor:
Visitor: