Q149 : Semi-supervised classification of evolving data streams
Thesis > Central Library of Shahrood University > Computer Engineering > MSc > 2019
Authors:
Hossein Hasan Nezhad Namaghi [Author], Hoda Mashayekhi[Supervisor], Morteza Zahedi[Advisor]
Abstarct: Data stream is a sequence of data generated from various information sources at a high speed and high volume. Among the main challenges of data stream analysis is the occurrence of concept drift. Concept drift is the change in statistical properties of data. Many available studies, to cope with the challenges of unlimited data stream length and also concept drift, use approaches assuming existence of true labels for all data. Nevertheless, regarding the cost of labeling instances, it is often assumed that only a part of instances are labeled. Another important challenge in data stream analysis is concept evolution. When the change of data leads to the emergence of new concepts, the evolution of the concept occurs and appears as a new emerging class. In this paper, a semi-supervised ensemble learning algorithm is proposed which uses entropy variation to detect concept drift in data stream classification. The proposed ensemble learning model is trained with a limited initial labeled set. Afterwards, in occurrence of concept drift, the unlabeled data is used to update the ensemble model. Also, in this method, the new class is identified by detected outliers with a strong correlation among them. The proposed method is capable of detecting changes in data, and improve its accuracy via updating the learning model. Experiment results show that the proposed method is more effective than other methods in different aspects.
Keywords:
#Data stream #ensemble learning #Concept drift #Entropy #Semi-supervised classification #Concept Evolution Link
Keeping place: Central Library of Shahrood University
Visitor: