What is representation in optical character recognition?
I am learning OCR and reading this book https://www.amazon.com/Character-Recognition-Different-Languages-Computing/dp/3319502514
The authors define 8 processes for implementing OCR, which follow one after another (2 after 1, 3 after 2, etc.):
- Optical scanning
- Location segmentation
- Preliminary processing
- Segmentation
- Representation
- Removing functions
- Confession
- Post-processing
This is what they write about the view (# 5)
The fifth component of OCR is presentation. Image presentation plays one of the most important roles in any recognition system. In the simplest case, grayscale or binary images are fed to the recognizer. However, in most recognition systems, to avoid additional complexity and improve the accuracy of algorithms, a more compact and characteristic representation is required. For this purpose, a set of functions is retrieved for each class, which helps distinguish it from other classes while remaining invariant to inherent differences within the class. Character representation methods of representation are usually classified into three main groups: (a) global transformation and expansion series (b) statistical representation and (c) geometric and topological representation.
This is what they write about function extraction (# 6)
The sixth component of OCR is feature extraction. The purpose of object selection is to capture the basic characteristics of symbols. Feature extraction is accepted as one of the most difficult pattern recognition problems. The easiest way to describe a symbol is with an actual bitmap. Another approach is to extract some of the features that characterize symbols, but leave irrelevant attributes. Methods for identifying such features are divided into three groups, namely: (a) distribution of points (b) transformations and expansions of series and (c) structural analysis.
I am completely confused. I don't understand what a view is. As I understand it, after segmentation, we must take some functions from the image, for example a topological structure, such as the Freeman chain code, and must correspond to some stored in the model of the training stage, i.e. Perform recognition. In other words - segmentation - feature extraction - recognition. I don't understand what needs to be done at the presentation stage. Explain, please.
source to share
The presentation component takes a bitmap created by segmentation and converts it to a simpler format ("presentation") that retains the characteristic properties of the classes. This is done in order to subsequently reduce the complexity of the recognition process. The Freeman chain code you mention is one such view.
Some (most?) Authors combine feature representation and extraction in one step, but the authors of your book chose to consider them separately. Changing the view is optional, but it reduces complexity and thus improves the accuracy of the learning and recognition steps.
From this simpler view, the functions are extracted during the feature extraction stage. Which features are extracted will depend on the view chosen. This article - Object Extraction Methods for Character Recognition - An Overview - describes 11 different object extraction methods that can be applied to 4 different representations.
The extracted functions are those that are passed to the trainer or recognizer.
source to share