Q125 : Summarization of Persian texts usin artificial intelligence techniques
Thesis > Central Library of Shahrood University > Computer Engineering > MSc > 2015
Authors:
Abstarct: Objective One of the methods proposed for encapsulating texts is the statistical method. In such methods, all of the information is drawn from the text itself and there is no need to possess any basic knowledge. The purpose of this thesis is to propose a statistical method to encapsulate Persian texts.
Methods: Statistical method to encapsulate Persian texts, which combines a statistical algorithm and a clustering model, initially extracts the key words and then allocates a specific rhythm to each single word and finally makes the sentences rhythmic according to them. What is next, using a clustering model, the proposed model categorizes similar sentences into a cluster and ultimately extracts the best sentences according to their rhythm and similarity. Using the similarity of the sentences and their rhythm causes more significant sentences, which are also related to one another, to be extracted.
Results: In this study in order to evaluate the proposed model, a set of data included 200 sample texts from Hamshahri newspaper and also Nasim Aftab magazine was collected. The proposed model was applied on collected data set therefore extracted abstracts evaluated baxsed on three criteria which were being fit, coherency and comprehensiveness. In the evaluation process the data set was given to the human factors and they have being asked to assign a number between 0 and 10 to the abstractions. According to the table 4-6 in context, the abstractions have got appropriate efficiency.
Conclusion: In this thesis in order to extract sentences of Persian texts, a statistical model has been proposed. This model includes some stages specially weighting the words, weighting sentences baxsed on key words and selecting sentences baxsed on more continuity. In the last one which is our novelty the sentences with more continuity has been selected. To do so, K-means model has been chosen. This model puts similar sentences next to each other so would be a great help to select the extracted sentences. Also in order to extract infinitive from some verbs a dictionary has been made and applied. Using similarity between contexts led to choose better sentence for summarization that expresses the excellence of our model in comparison with RAKE method.
Keywords:
#Encapsulation #Persian texts #statistical model #clustering #abstraction #summarization #similarity #K-means #RAKE
Keeping place: Central Library of Shahrood University
Visitor:
Keeping place: Central Library of Shahrood University
Visitor: