TK318 : The Recognition of Printed Text with Iranian sans Font
Thesis > Central Library of Shahrood University > Electrical Engineering > MSc > 2013
Authors:
Zeynab Bagheri [Author], Hossein Khosravi[Supervisor]
Abstarct: In recent decades, extensive research, writing recognition patterns include letters, numbers and other symbols commonly used in written documents in different languages is done. Given the progress made in the field of automatic text recognition technology called optical character recognition or OCR is formed. Text recognition is considered as an important part of e-government in our country and in recent years as the demand for a Persian text recognition system greatly enhanced. Due to the large amount of paper documents, digital image documents are converted by the scanner or camera, storage, efficient management and retrieval of these documents, files, in many applications, including office automation, and digital libraries are important. In general, text recognition system includes several parts , such as receiving image preprocessing, configuration analysis, diagnostics language, font and finally text recognition. Research conducted in some topics , such as preprocessing is independent of the text language and can be used with any language . But some other topics , such as font recognition depends on the context and results of research conducted for other languages can not directly be applied to Persian . Most researches in the field of Persian literature on the recognition of images with high resolution images and text clean and false and document identification are with some known font. In research conducted for the recognition of Persian texts there are three approaches that baxsed on separation words baxsed on the overall shape recognition and mix of them. This thesis aims at recognizing typed text written with the Iranian sans font, with a minimum size of 9 and the resolution is 300 dpi. according to The font style and readability, it is more addressed and every day the volume of computing and Internet environment is enhanced. This font style is replacement for the default Windows operating system font likeTahma, Despite readability, standard spacing between rows, beauty and consistency of Latin, this font has a structural complexity that complicates the process of recognition. In this thesis after the production of databaxse suitable, discrete and continuous characters classification were trained. then with solving the problem of overlapping of the words, separation approach is used to separate letters. Finally, the performance result of the system for processing some images of printed text, is provided where the separation accuracy of 96% and an accuracy of 85% was achieved in the identification.
Keywords:
#text recognition pen Iranian sans #separation-baxsed approach #neural network classification Link
Keeping place: Central Library of Shahrood University
Visitor: