Showing posts with label zone ocr. Show all posts
Showing posts with label zone ocr. Show all posts

Tuesday, February 16, 2010

Zone OCR and Accuracy within Recognition Zones

Zone OCR Accuracy

So when doing zone OCR , or Optical Character Recognition on a portion of a page, what features do I need to ensure I have the best possible accuracy.  List below:

  • Utilize a document capture application that provides some type of page registration.  The problem with using zone OCR is that most engines utilize a set template of coordinates on the page, and just repeat this "zone" on each page.  If the scanner is off, or the page skewed, you can have erroneous readings.  Page registration gives the recognition engine the ability to anchor a page feature, always referencing the zone from the set coordinates of the feature.
  • Utilize a scanning application that provides the ability to perform image processing on the zone prior to running Optical Character Recognition . Removing lines, deshading, despeckling can provide a cleaner zone, and thus improve overall accuracy.
  • Some advanced capture applications provide the ability to filter zones based on character sets.  This allows you to interpret the characters within a zone as say, all numbers, or perhaps a date, which provides the engine a more narrower character set for the whole recognition process.  iCapture for example, not only allows character set mapping to zone ocr templates, but also provides auto-correction for the most commonly misinterpreted characters.
  • Finally, and highly recommended for the highest level of accuracy, is the ability to set a character matching filter for a zone.  This technology, sometimes called ADE, provides the ability to utilize regular expressions to ensure a match, and lets you over draw the recognition area / zone and filter to your liking.

Sunday, December 13, 2009

What is Zone OCR?

What is Zone OCR?

Zone OCR Software provides the ability to focus in on just a single, or multiple, sections (zones) of a scanned document or image.  Converting specific zones to text is an important optical character recognition feature set, and one that can be applied in just about any business type.  Its main use is to harvest values from images, and utilize them as index values, to provide search capability later.  Not all zone OCR engines are equal, and you typically need a very accurate engine to produce the required results. Some accurate engines include Glyphreader, Recostar, Docustar and many others.

It is often imperative to "clean up" the zone prior to attempting the conversion to text.  Clean up can include line removal, despeckle, deskew, etc., which are found  in almost any product that provides OCR and Image Processing features.