So when doing zone OCR , or Optical Character Recognition on a portion of a page, what features do I need to ensure I have the best possible accuracy. List below:
- Utilize a document capture application that provides some type of page registration. The problem with using zone OCR is that most engines utilize a set template of coordinates on the page, and just repeat this "zone" on each page. If the scanner is off, or the page skewed, you can have erroneous readings. Page registration gives the recognition engine the ability to anchor a page feature, always referencing the zone from the set coordinates of the feature.
- Utilize a scanning application that provides the ability to perform image processing on the zone prior to running Optical Character Recognition . Removing lines, deshading, despeckling can provide a cleaner zone, and thus improve overall accuracy.
- Some advanced capture applications provide the ability to filter zones based on character sets. This allows you to interpret the characters within a zone as say, all numbers, or perhaps a date, which provides the engine a more narrower character set for the whole recognition process. iCapture for example, not only allows character set mapping to zone ocr templates, but also provides auto-correction for the most commonly misinterpreted characters.
- Finally, and highly recommended for the highest level of accuracy, is the ability to set a character matching filter for a zone. This technology, sometimes called ADE, provides the ability to utilize regular expressions to ensure a match, and lets you over draw the recognition area / zone and filter to your liking.
No comments:
Post a Comment