Optical Character Recognition API: API description

The OCR API is a library API providing the user with recognition functions to process images and convert them to texts. Images, no matter whether from the phone camera or existing files, are used by these functions through their handles from the font and bitmap server. Currently, only 24-bit colored and 8-bit grayscale images are supported, and they have to be in the bitmap format.

There are two sets of recognition interfaces implemented in the OcrSrv library. One is for the applications which require document recognition from the whole image, and the procedure starts from the layout analysis by using a layout engine to analyze the image and divide it into several blocks which contains the title, subtitles, paragraphs and other text blocks of the document if possible, then a recognition engine will further process these blocks and convert them to Unicode strings respectively. The other set of functions are used for recognizing just part of an image. Hereby the API user has to inform the recognition engine about the exact position and extent where the conversion should be performed. In addition, the API user can also designate some special content types (such as phone numbers, web addresses and e-mails) to make the region recognition result even more accurate.

To carry out the recognition, the OCR API needs databases for the supported languages. The supported languages are English, Japanese, Simplified Chinese and Traditional Chinese. Note that those databases are not always shipped together with the phones (some shipped in other ways for example in some external memory cards). The OCR API provides some methods to tell exactly what are the ready language databases on your device.

In short, the API provides automatic layout analysis and recognition on images. A typical use of the API would be, for example, to input texts from the phone camera or to recognize and save personal information from business card images.

Use cases

Use cases of the OCR API are illustrated in the following figure.

Figure 1: The use cases of the OCR API

There are five use cases here:

OCR API initialization

Recognition with the layout analysis

Region Recognition

Cancel operation

Release the OCR API

API class structure

The API interface class structure consists of six interfaces. A static class OCREngineFactory create the OCR engine instance and the client application shall be inherited from the MOCREngineObserver to get the layout analysis and recognition result asynchronously.

Figure 2: API Class Structure and Interaction

The interface MOCREngineRecognizeBlock and MOCREngineLayoutRecognize provide the two sets of recognition API, and the MOCREngineBase offers some features, which are in common regardless the recognition types.

The detail function in these interface is visible in following figure.

Figure 3: Interface Class Functions


Copyright © Nokia Corporation 2001-2008
Back to top