Q107 : Improving Clustering Algorithms for Big Data Using Cluster Computing
Thesis > Central Library of Shahrood University > Computer Engineering > MSc > 2017
Authors:
Zohreh Ferydoon Moghadam [Author], Hoda Mashayekhi[Supervisor], Vahid Abolghasemi[Advisor]
Abstarct: Today, in various applications such as traffic control systems, weather sensors, medical systems, social networks, etc. data is produced at a rapid rate. This phenomenon is referred to as data stream, and its analysis typically requires evolutionary and incremental methods. Clustering is among the common methods of data analysis and mining, and several algorithms have been presented in this area. Clustering data streams requires methods different from classic approaches due to reasons such as lack of simultaneous access to all data, cluster changes, etc. This thesis presents a model-baxsed clustering algorithm that uses conditional probability theory and prior knowledge for a probabilistic clustering. The proposed algorithm is analyzed with artificial and real data, and later compared with FCM and Gustafson-Kessel methods. According to the results, the proposed algorithm improves the accuracy of clustering and demonstrates more resistance to noise. Given the incremental nature of the data stream, the proposed algorithm is extended for this type of data and evaluated after implementation using the Spark cluster computing frxamework. With the parallel implementation of the algorithm, the runtime is reduced. The experiments show the scalability of the proposed algorithm for handling large amounts of data.
Keywords:
#Clustering #Probabilistic Clustering #DataStream #DataStream Clustering Link
Keeping place: Central Library of Shahrood University
Visitor: