Optical Character Recognition API: Using the OCR API

To use the OCR API, an application needs to first link the OcrSrv library to itself, and then create the OCR engine instance according to the recognition type.

In general, using the OCR API always contains the following steps:

  1. Link the OcrSrv library (ocrsrv.lib) to the project file of the application.
  2. Provide an observer class which realizes the MOCREngineObserver interface.
  3. Call OCREngineFactory::CreateOCREngineL method and pass the OCREngineFactory::TEngineType information to specify the recognition type.
  4. Type cast the type of the pointer returned from OCREngineFactory::CreateOCREngineL to either MOCREngineLayoutRecognize or MOCREngineRecognizeBlock according to the recognition type.
  5. Use the recognition interface to process images.
  6. Receive the progress information during the recognition through the observer class.
  7. Receive the recognition result through the observer class.
  8. Release the OCR engine through OCREngineFactory::ReleaseOCREngine method.

See the following section for detailed instructions on how to use the interfaces.

OCR API initialization

To create the recognition instance, first the client application needs to provide three parameters for OCREngineFactory::CreateOCREngineL. They are the reference to the observer class inherited from MOCREngineObserver , the TOcrEngineEnv object to set the recognition thread priority and maximum heap size, and a OCREngineFactory::TEngineType enumeration value to specify one recognition type. After creating a recognition instance, the client application gets the pointer of MOCREngineInterface and needs to convert it to either MOCREngineLayoutRecognize or MOCREngineRecognizeBlock according to the selected type.

Figure ‘Initialize OCR service’ shows how the process of the OCR API Initialization.

Figure 4: Initialize OCR service

The following code snippet demonstrates how to create a recognition instance which supports the documental layout analysis. Creating an instance that supports regional recognition is quite similar so not presented here.

The OCR service perform the recognition in a working thread, and the priority of it shall be set through TOcrEngineEnv .

const TOcrEngineEnv env;



env.iPriority = EPriorityLess;          // Set thread's priority

env.iMaxHeapSize = 1200*KMinHeapGrowBy; // Set thread's heap maximum size



//Create the OCR engine instance. Note that the "observer" is an object which instantiates the MOCREngineObserver.

MOCREngineInterface* myEngine = OCREngineFactory::CreateOCREngineL(observer, 

                                                                   env, 

                                                                   OCREngineFactory::EEngineLayoutRecognize);



// Convert the instance from MOCREngineInterface point to MOCREngineLayoutRecognize

MOCREngineLayoutRecognize* layoutEngine = static_cast<MOCREngineLayoutRecognize*>(myEngine);

Recognition with layout analysis

The recognition with layout analysis consists of two steps. The first step is to analyze the entire image and get the information of the areas where the texts are. And the second step is to recognize part or all of the areas according to the user selection through an array of effective area indices. The layout analysis and recognition result will be sent asynchronously from the callback functions MOCREngineObserver::LayoutComplete and MOCREngineObserver::RecognizeComplete in the observer class.

Figure ‘Recognize with layout ’ shows how the process of recognition with layout.

Figure 5: Recognize with layout

The following code snippet demonstrates how to use the with layout recognition interface. After the layout analysis, MOCREngineObserver::LayoutComplete function will be called to inform the client application about the text block information. During the recognition process, the client application will continuously get the progress information through the MOCREngineObserver::RecognizeProcess function. After the recognition, recognition result will be provided through the MOCREngineObserver::RecognizeComplete function.

One or two supported languages have to be set active through MOCREngineBase::SetActiveLanguageL method from the base interfaces. The purpose of setting two languages is to do the recognition on an image which has both of the languages on it. For example, there may be English words among Chinese documentation. Then the user shall set English and Chinese as active languages.

Note that no more than two languages can be possibly mixed. And only western languages and eastern languages can be mixed together. It's not possible to set for example Chinese and Japanese both as active languages.

/**

* Set active languages

*/

RArray<TLanguage> languages;

languages.Append(ELangEnglish);    // A western language

languages.Append(ELangPrcChinese); // An eastern language



TRAPD(err, myEngine->Base()->SetActiveLanguageL(languages)); 



/**

* Layout analysis

*/

TOCRLayoutSetting layoutSettings;

layoutSettings.iBrightness = TOCRLayoutSetting::ENormal;

layoutSettings.iSkew = ETrue;      // Set this to ETrue will trigger the geometrical adjustment



_LIT(KFileName, "C:\\image.mbm");

CFbsBitmap image;

image.Load(KFileName);

const TInt handle = image->Handle();     // Get the handle from the font&bitmap server



// Type of myEngine is MOCREngineLayoutRecognize

TRAPD(err, myEngine->LayoutAnalysisL(handle, iLayoutSettings) );

The function MOCREngineBase::LayoutComplete gets called after the layout analysis completed. Its parameter aError indicates whether the analysis is successful or not, the aBlockCount tells the number of text areas identified. The aBlocks is a TOCRBlockInfo array that stores position and extent information of every identified text area. The user can certainly select which areas need to be recognized.

RArray<TInt> blockIndex; // Block index



for (TInt i = 0; i < blockCount; i++) // The blockCount from callback parameter aBlockCount

    {

        // If current block count is four and you do not like to recognize No.0 and No.1 block.

        if (i == 0 || i == 1)

           {

           continue;

           }

        blockIndex.Append(i);

    }



// Recognize No.2 and No.3 block.

TRAPD(err, myEngine->RecognizeL(iRecogSettings, blockIndex));  

Region recognition

Region recognition functions are declared in MOCREngineRecognizeBlock . To use this type of recognition, EEngineRecognizeBlock shall be passed to OCREngineFactory::CreateOCREngineL as the type of the OCR engine. There are two types of region recognition:

Text area recognition

Figure ‘Recognize block’ shows the process of recognizing a specified text area.

Figure 6: Recognize block

The following code snippet demonstrates how to start a typical region recognition.

TOCRLayoutBlockInfo layoutInfo;



layoutInfo.iLayout = EOcrLayoutTypeH; // Set when the text lines are horizontal

layoutInfo.iText = EOcrTextMultiLine; // Set when there are more than one lines inside this area

layoutInfo.iBackgroundColor = EOcrBackgroundLight; // Set when the text color is darker than the background

layoutInfo.iRect.SetRect(0, 0, 100, 100); // Set the recognition area



_LIT(KFileName, "C:\\image.mbm");

CFbsBitmap image;

image.Load(KFileName);

const TInt handle = image->Handle(); // Get the handle of the image



// Type of myEngine is MOCREngineRecognizeBlock

TRAPD(err, myEngine->RecognizeBlockL(handle, layoutInfo));

Special content recognition

Figure ‘Recognize special region’ shows the process of special content recognition.

Figure 7: Recognize special region

The following code snippet demonstrates how to start a typical special region recognition. The user can specify the text content to be either E-mail addresses, phone numbers or web addresses.

TRegionInfo regionInfo;



regionInfo.iBackgroundColor = EOcrBackgroundLight;

regionInfo.iType = TRegionInfo::EEmailAddress;

regionInfo.iRect.SetRect(0, 0, 100, 100);



_LIT(KFileName, "C:\\image.mbm");

CFbsBitmap image;

image.Load(KFileName);

const TInt handle = image->Handle();



// Type of myEngine is MOCREngineRecognizeBlock

TRAPD(err, myEngine->RecognizeSpecialRegionL(handle, regionInfo));

Cancel recognition

During the recognition, the client application can cancel the recognition process. A cancel request is also handled asynchronously, observer functions in the MOCREngineObserver will report a KErrCancel message through their aError parameter. Both recognition with layout analysis and the region recognition can be canceled.

Figure ‘Cancel recognition’ shows the process of canceling the recognition.

Figure 8: Cancel recognition

The following code snippet demonstrates how to issue a typical Cancel request.

myEngine->Base()->CancelOperation(); 

Releasing the OCR API

To release the OCR engine instance, you need to call OCREngineFactory::ReleaseOCREngine function.

Figure ‘Release the OCR API’ shows how to release the OCR API

Figure 9: Release the OCR API

The following code snippet demonstrates how to use the OCREngineFactory::ReleaseOCREngine the recognizing with layout interface.

Note that there could be only one type of the recognition engine existing at the same time. The instance shall be released before creating another instance from the OCREngineFactory::CreateOCREngineL .

OCREngineFactory::ReleaseOCREngine(myEngine);

Error handling

All exceptions are reported through Symbian OS leave mechanism.

Table 1: General error messages:
ExceptionDescription
KErrNoMemoryReported when there isn't enough memory for the layout analysis or the recognition.
KErrServerBusyReported when a new recognition request coming while the OCR engine is busy.
KErrAbortChild thread does not exist or operation is aborted.
KErrArgumentBad parameters.
KErrNotSupportedSome functionality is not supported.
KErrGeneralGeneral system level error exceptions.
KErrNotFoundNo engine or database found.

Table 2: Layout analysis specified error messages:
ExceptionDescription
KErrOcrBadImageBad image or unsupported image format (Only 24-bit colored or 8-bit gray scale images in bitmap format are supported).

Table 3: Recognition specified error messages:
ExceptionDescription
KErrOcrBadRegionBad layout region.
KErrOcrNotSetLanguageBefore layout or recognition, you must set one or two active languages.

Table 4: Language specified error messages:
ExceptionDescription
KErrOcrBadLanguageUnsupported language.
KErrOcrBadDictFileBad database file.

Memory overhead

The dynamic memory consumption mostly comes from the OCR engine itself. Heap consumption now is around 900KB - 1000KB depending on the image size and language variants.

Extensions to the API

The OCR API does not explicitly support any kinds of extensions to it.


Copyright © Nokia Corporation 2001-2008
Back to top