Q24 : An Automatic Method for Clustering News on the Internet
Thesis > Central Library of Shahrood University > Computer Engineering > MSc > 2012
Authors:
Abstarct: Nowadays the need and tendency of being announced about news, has an unavoidably and direct effect on making decision in personal, social and political fields. On the other hand, Uncontrolled growth of website numbers and their information has becomes one of the most important problems which cause many difficulties for users on the World Wide Web.
By the way, wide speed of unsupervised news publishing and spreading on the web, and impressive increase of news websites in different fields, has reduced the ability for checking them. This huge amount of information has confused users instead of helping them to achieve their purpose. Because of it, design and implementation of an automatic news clustering system for managing the news on the web has explainable. Proposed method tries to resolve the problem by using data and web mining methods for clustering text news snippets.
At the first, a web client has defined for crawling the valid news agencies for acquiring news for learn and test levels. The clustered news in pre-defined groups has been showed as the result of the project. User can select both baxse news agency and beloved groups for final representation.
At last, too many researches have been done till now for clustering texts but all of these efforts use either term frequency or document frequency for weighing extracted keywords and then tried to make a comparison between different similarity measures, but recently a few numbers of researches have been focused on clustered features instead of documents and worlds. In this thesis we intend to use group’s features by building a semi text with main text words in each group and calculating its similarity. The proposed architecture consists of a number of components namely news extraction, preprocessing, group databaxse creation, keyword extraction, keyword weighting, clustering, and databaxse reconstruction. Our main contribution is that we introduce a mechanism to automatically catch and cluster news snippets on the web. At last, the groups which have passed the defined threshold are labeled as winner groups. Our evaluations show that the proposed method along with news snippet structures represents it’s efficiently and effectiveness.
Keywords:
#Clustering #News Documents #Data Mining #Web Mining
Keeping place: Central Library of Shahrood University
Visitor:
Keeping place: Central Library of Shahrood University
Visitor: