Shahrood University Of Technology Thesis

Q152 : Sentence and paragraph similarity using word embedding models

Thesis > Central Library of Shahrood University > Computer Engineering > MSc > 2019

Authors:

Morteza Allahpour [Author], Morteza Zahedi[Supervisor], Hoda Mashayekhi[Advisor]

Abstarct: Nowadays, given the increasing amount of information and documentation in various fields, quick access to information is of particular importance to each individual. Hence, in addition to information retrieval techniques, summarization and categorization techniques can also be helpful in increasing the speed of users access to their documents. The construction of a system that can effectively identify the similarity between the two terms has been the subject of many studies. Determining the distance between the two words can be done through a similarity between words or by machine learning methods. In this research, a method is proposed that, in terms of the meaning of words in each sentence, identifies the similarity between sentences. To get the meaning of each word, we use word placement models. One of the features of these models is that they represent each word in a multi-dimensional space, so that with the various operations of vectors such as the dual-welded collections, one can obtain the meaning of the neighborhood of two words. In the following, with the help of the four function extraction functions, the properties of the terms are extracted, then these attributes are used in a category to identify the similarity or non-identity of the two sentences. In making such a system, one of the most important components of the ability to recognize the similarity between sentences and paragraphs of the texts is the subject of much research. This method can recognize the semantic similarity between the two sentences in spite of their lexical similarity. In addition to recognizing similarity, this method is also effective in detecting the lack of similarity between the two exxpressions, so that after doing the tests, this method accurately categorized 83% of the data tested, which performs better than the introduced methods.

Keywords:

#Word embedding #Text Similarity #Text Processing #Text Mining Link
Keeping place: Central Library of Shahrood University
Visitor:

Shahrood University of
Technology

ABOUT

ADMINISTRATION

ADMISSION