Two-Stage Rejection Algorithm to Reduce Search Space for Character Recognition in Ocr

Topics: Optical character recognition, Machine learning, Image scanner Pages: 11 (2858 words) Published: December 26, 2012
Two-Stage Rejection Algorithm to Reduce Search Space for Character Recognition in OCR

Srivardhini Mandipati, Gottumukkala Asisha, Preethi Raj S, and Chitrakala S

Department of Computer Science and Engineering, Easwari Engineering College, Chennai, India

Abstract. Optical Character Recognition converts text in images into a form that the computer can manipulate. The need for faster OCRs stems from the abundance of such text. This paper presents a Two-Stage Rejection Algorithm for reducing the search space of an OCR. It is tacit that the reduction in search space expedites an OCR. Preprocessing operations are applied on the input and features are extracted from them. These feature vectors are clustered and the Two-Stage Rejection Algorithm is applied for character recognition. With about the same character recognition rate as other OCRs, an OCR reinforced with the Two-Stage Rejection Algorithm is considerably faster.

Keywords: Optical Character Recognition, Feature Extraction, K-means. 1Introduction

Optical character recognition has been an active area of research for many decades. The fact that OCRs have the potential to simplify data entry in the future adds value to research in this area. OCRs use various pattern matching techniques for character recognition. Most OCRs typically use classifiers like SVM or neural networks for character recognition. The training process for these classifiers is time consuming. Moreover, with an increase in the number of classes, the comparisons made increases and consequently the time taken for character recognition increases. Hence, they cannot be easily extended to recognize characters from additional languages. The proposed system uses a structural approach as opposed to statistical approach for feature extraction. The strength of the structural method over the statistical one is its representation of a pattern that is similar to the way human perceive it. The structural features help retain the local shape description of the characters. Like all other OCRs, any image undergoes preprocessing. Additionally, the dataset is clustered and a Two-stage Rejection Algorithm is applied to it to reduce the search-space for character recognition. A considerable increase in the performance was observed during the experimentation. 2Related Works

Numerous works have been carried out in the field of OCR. When an OCR is being extended to recognize characters from multiple languages, the dataset increases which will considerably increase the number of comparisons required to recognize a character. This is all the more true when a single document contains characters from different languages. In our paper, we focus on the reduction of the search space for character recognition. This is done by clustering the training dataset and reordering the clustering. Weijie Su and Xin Jin [1] propose a hidden Markov model with parameter-optimized K-means clustering for handwritten character recognition. Here, they improve K-means clustering by considering the influence of neighboring pixels and different weights of pixels in different places. This model aims at improving the average accuracy of HMM with K-means clustering for handwriting characters recognition. Karthik Sheshadri et al. [2] address the problem of Kannada character recognition, and propose a recognition mechanism based on K-means clustering. Here they propose a segmentation technique to decompose each character into components from 3 base classes, thus reducing the magnitude of the problem. They have also used probabilistic and geometric seeding as heuristics to ensure uniformity of centroids from the extracted character with the centroids in the training database. Mu-King Tsay, Keh-Hwashyu, Pao-Chung Chang [3] designed a feature transformation module to extract discriminative features from the input scanned document to enhance the recognition performance. The initial feature transformation matrix is...

References: [1]GWeijie Su, Xin Jin, “Hidden Markov Model with Parameter-Optimized K-means Clustering for Handwriting Recognition”, International Conference on Internet Computing and Information Services, pp:435-438, 2011
[2]Karthik Sheshadri, Pavan Kumar T Ambekar, Deeksha Padma Prasad and Dr.Ramakanth P Kumar, “An OCR system for Printed Kannada using K-means clustering”, International Conference on Industrial Technology ,pp:183-187, 2010
[3]Mu-King Tsay, Keh-Hwashyu, Pao-Chung Chang, “Feature Transformation with Generalized Learning Vector Quantization for Hand-Written Chinese Character Recognition”, IEICE Transactions on Information & System, Vol.E82-D, 1992
[4]B. Vijay Kumar, A. G. Ramakrishnan, “Radial Basis Function And Subspace Approach For Printed Kannada Text Recognition”, IEEE International Conference on Acoustics, Speech, and Signal Processing, pp: V-321-4 vol.5, 2004
[5]Premnath Dubey, Wasin Sinthupinyo, “New Approach on Structural Feature Extraction for Character Recognition”, International Symposium on Communications and Information Technologies, pp:946-949, 2010
[6]Igor Kleiner, Daniel Keren, Llan Newman, Oren Ben-Zwi,“Applying property testing to an image partitioning problem”, IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 33, No.2, 2011
[7]Sanghamitra Mohanty, Himadri Nandini Dasbebartta, Tarun Kumar Behera, “An Efficient Bilingual Optical Character Recognition(English-Oriya) System for Printed Documents”, Seventh International Conference on Advances in Pattern Recognition, pp: 398 – 401, 2009
[8]Oivind Due Trier, Anil K Jain, and Torfinn Taxt ,“Feature Extraction Methods For Character Recognition–A Survey ”, Pattern Recognition, Vol 29, pp 641-662, 1995
[9]Vuokko Vuori, Jorma Laaksonen , “A Comparison of Techniques for Automatic Clustering of Handwritten Characters”, 16th International Conference on Pattern Recognition, Vol 3, pp:168-171, 2002
Continue Reading

Please join StudyMode to read the full document

You May Also Find These Documents Helpful

  • Optical Character Recognition for Cursive Handwriting Essay
  • Optical Character Recognition and Magnetic Disk Essay
  • characters Research Paper
  • Need Recognition and Information Search Essay
  • Cuckoo Search Algorithm Essay
  • Optical Character Recognition Essay
  • Ai in Optical Character Recognition Essay
  • Magnetic Ink Character Recognition Essay

Become a StudyMode Member

Sign Up - It's Free