Tuesday, March 30, 2010

What is Advanced Data Extraction (ADE)?

OCR and Data Extraction

A big part of any OCR solution is the process of data capture and extraction.  Most document capture applications provide the ability to process the converted text and provide the extraction of expressions from the text.  So how can this  help?  Well, the ability to parse the OCR text provides automation, and allows you to populate fields based on what you find.

An example might be a form that has DOB: 1/2/1968

You want to extract everything to the right of DOB: from a document.  You can do this with an ADE engine.

Saturday, March 6, 2010

OCR and the Right Settings

What DPI should be set for optimal OCR Accuracy?

So. I get this question all the time and decided it might be good to post about it.  What is the best DPI setting for Optical Character Recognition (OCR)?

I have been at clients that erroneously believe the higher the DPI, the beeter the results, and feel pain whenever I see an OCR Scanner set beyond 300 DPI, and some even at 600 DPI!!  Holy cow, how do you handle those file sizes?

The fact remains that almost all OCR engines on the market are tuned and optimized for 300DPI for optimal conversion and recognition.  Going beyond this will provide no better results, and significantly increase your file size exponentially.  Most Document Capture companies provide image processing prioer to OCR that will allow you to scan at 200 DPI, with fairly consistent results.