TK552 : Text Extraction from Raster Maps Using Color Space Quantization
Thesis > Central Library of Shahrood University > Electrical Engineering > MSc > 2016
Authors:
Abstarct: Text laxyers in maps provide valuable information by relating names to their positions on the maps. Most previous researches in this field are focused on Latin texts and the results of presented methods on Persian texts are poor. In this thesis we present a new method for automatic extraction of texts from raster maps using self-organizing maps for quantization.
In the proposed approach a Mean-Shift algorithm is used to reduce changes in the color space. Consequently, we quantize the maps using SOM and reduce the dimensions of color space with it. baxsed on the results obtained through applying the program to different maps, either the first, the fifth or the seventh laxyer resulted from SOM is the best textual laxyer. In the next step, the numbers of connected components which are in a predefined threshold, in each of the three aforementioned laxyers are counted. The Text laxyer is the laxyer with the most connected components. In this laxyer each of the components has an intensity value in gray level which depends to their main color in the original map, but the black background has no textual information. Therefore, we consider black pixels as background pixels and the other pixels as foreground. By using this approach, the text laxyer is extracted.
To prepare words for OCR, we expand the binarized image using a structure element and "dilation" morphological operator. At the next step, we find connected components in the dilated image and "AND" each dilated component to its related component in the original binarized image. Using boundingbox property, we find each boundingbox profile in each angle between -90 and +90. The angle in which the number of non-zero values of the word's profile has its minimum value is our desired rotation angle. Therefore, we rotate each word by its desired rotation value to be placed in horizontal position. This method is independent of font, size, direction and color of the texts and can find both Persian and Latin texts in maps. Experimental results show %90.4 precision and %86.6 recall in text extraction from raster maps in our proposed approach.
Keywords:
#Color space quantization #Mean- Shift algorithm #Preparation for OCR #Text extraction #Self- organizing maps
Keeping place: Central Library of Shahrood University
Visitor:
Keeping place: Central Library of Shahrood University
Visitor: