TK200 : Thesis Submitted of Msc Degree of Electrical Engineering
Thesis > Central Library of Shahrood University > Electrical Engineering > MSc > 2011
Authors:
Elham Bayesteh Tashk [Author], Alireza Ahmadifard[Supervisor], [Advisor]
Abstarct: In this these has been studied an off line Farsi handwritten words recognition system in limited lexicon. The proposed system is baxsed on a two-step method. In the first step, the words in the lexicon are clustered using ISOCLUS and Hierarchical clustering algorithms baxsed on similarity. The similarity features used in this step are upper, lower, vertical projection and background/Ink transitions profile vectors extracted from the each column of image. To reduce the feature vectors dimension and eliminate the diagram shaking, one dimensional Discrete Wavelet Transform is used. DTW algorithm is applied for measuring similarity between the feature vectors extracted from words. The mean of each cluster is used as its representative cluster and its entry in the pictorial dictionary. Total number of handwritten word images in the studied dataset is 16,000 images included 502 city names of Iran. This dataset is called IRANSHAR. In this stage, the handwritten word images are clustered in 62 clusters. In the recognition of input word, first five closest clusters to the input word are selected. This makes lexicon reduction of 77% with 94% recognition rate. The second stage is recognition of the input unknown word image from the list of candidate words from the first stage. In this stage is used histogram of local gradient feature. For the feature extraction, gradient image is divided into a number of blocks. In this these, these blocks are modified in two methods for improved recognition performance. In first method, the size of the blocks is set baxsed on the distribution of black (pen) pixels. In second method, each main component of handwritten word is divided separately and then they are combined. Finally, the gradient feature vectors of input words are compared with gradient feature vectors of candidate words using KNN and multi-class SVM classifications. The recognition result on handwritten words of IRANSHAR dataset showed that The lexicon reduction stage increased accuracy and speed relatively due to the elimination of dissimilar word in recognition of handwritten word and the first method modification of the local gradient histogram achieved 13% improvement in proposed system recognition rate.
Keywords:
#word handwritten recognition #Lexicon reduction #ISOCLUS clustering algorithm #Hierarchical clustering algorithm #DTW algorithm #local gradient histogram #k nearest neighbor classification #multi class SVM classification Link
Keeping place: Central Library of Shahrood University
Visitor: