TK1025 : Identification of handwritten and typed digits in natural scene images using deep networks
Thesis > Central Library of Shahrood University > Electrical Engineering > MSc > 2024
Authors:
Mohammad Reza Gholami Barm [Author], Hossein Khosravi[Supervisor]
Abstarct: Identifying digits in various scxripts and contexts has become one of the most sought-after topics in recent years. These digits can be found in handwritten notes, typed texts, house numbers, graffiti, advertising boards, and vehicle license plates. Currently, multiple methods, including classical image processing and machine learning, are employed for the recognition and classification of numbers. In this study, we aim to explore both traditional and modern object recognition methods, focusing on the YOLO algorithm. By training the best versions of this family of algorithms, we intend to make slight modifications to the model architecture to propose a network for recognizing Persian digits in natural scene images that maintains accuracy and speed while being lighter and producing optimal output with less computational volume. In other words, our goal in this research is to identify handwritten and typed digits in natural scene images using deep networks baxsed on the YOLO family. To achieve this, we first collected a dataset of approximately 4,000 images from various natural scenes containing typed and handwritten Persian numbers and labeled them. We then utilized several selected models from the YOLO algorithm family to identify the digits present in these images. The main advantage of this approach is the elimination of classical processing steps such as image segmentation or binarization, allowing the proposed deep network to recognize digits directly and rapidly. Next, we evaluated the performance of different YOLO versions in recognizing Persian numbers from the studied dataset baxsed on key criteria. The eighth version of the YOLO algorithm demonstrated a recognition accuracy of 97% and a recall metric of 98% compared to previous versions tested in this research, making it the preferred model. Finally, by reducing the number of filters in the architecture of the YOLO 8 model, we achieved a lighter recognition algorithm, decreasing the network's computational load from 6.8 GFlops in YOLO 8-nano to 6.1, while achieving a recognition accuracy of 98.8% and a recall metric of 99%, outperforming the eighth version of the YOLO algorithm.
Keywords:
#Digit recognition #Object detection #Natural scenes images #Deep learning #Deep neural networks #YOLO algorith Keeping place: Central Library of Shahrood University
Visitor: