Wednesday, March 9, 2011

What is Intelligent Character Recognition (ICR)?

So, Optical Character Recognition (OCR) is the process of recognizing computer generated text from an image, typically one that is scanned using document capture software.   If you don't know the difference between OCR and Capture, see my other post here:  OCR vs. Capture.

Intelligent Character Recognition (ICR) is the process of recognizing hand-printed or handwritten information from a scanned document.  It utilizes the patterns of the pixels to match to specific written characters.  This form of recognition is typically not as accurate as OCR, but there are several ways to make the accuracy acceptable, the main of which is to provide combed fields or spaced boxes to ensure character spacing, or whitespace between symbols.

Tuesday, March 30, 2010

What is Advanced Data Extraction (ADE)?

OCR and Data Extraction

A big part of any OCR solution is the process of data capture and extraction.  Most document capture applications provide the ability to process the converted text and provide the extraction of expressions from the text.  So how can this  help?  Well, the ability to parse the OCR text provides automation, and allows you to populate fields based on what you find.

An example might be a form that has DOB: 1/2/1968

You want to extract everything to the right of DOB: from a document.  You can do this with an ADE engine.

Saturday, March 6, 2010

OCR and the Right Settings

What DPI should be set for optimal OCR Accuracy?

So. I get this question all the time and decided it might be good to post about it.  What is the best DPI setting for Optical Character Recognition (OCR)?

I have been at clients that erroneously believe the higher the DPI, the beeter the results, and feel pain whenever I see an OCR Scanner set beyond 300 DPI, and some even at 600 DPI!!  Holy cow, how do you handle those file sizes?

The fact remains that almost all OCR engines on the market are tuned and optimized for 300DPI for optimal conversion and recognition.  Going beyond this will provide no better results, and significantly increase your file size exponentially.  Most Document Capture companies provide image processing prioer to OCR that will allow you to scan at 200 DPI, with fairly consistent results.

Wednesday, February 24, 2010

OCR Software Post

OCR Software for Business


8 Things About OCR

Tuesday, February 23, 2010

Google Goggles and OCR

Cool stuff:

OCR Software and Character Correction

Optical Character Recognition and Character Correction

So what is character correction when associated with OCR?  The OCR process provides the recognition and conversion of images to text, and in this process, there can be many characters that can be misidentified throughout the conversion process.  Typically, document capture applications provide the ability to identify commonly misinterpreted characters through a table of correction mappings.  So lets say a particular zone OCR field was designated as numbers only, and the engine interpreted an "l" for a "1" (that is an l for a one).  The correction piece of the recognition engine can provide logic to the OCR process, and make sure the text is properly interpreted. 

This is just one of many ways to improve accuracy, but note you will need the right kind of OCR application that allows this feature to be enabled.

Friday, February 19, 2010

Are index fileds really necessary when you have the full text OCR?

Full text OCR

Ah, the old debate, do I just perform optical character recognition on all my scanned documents, make them searchable OCR PDFs, and rely on the OCR to retrieve documents?  Why use index fields when I already have all the converted text?

Index fields, or performing the indexing process, provides structured data about the documents.  This data can be utilized, especially when using document capture software, to link into columns and index fields in your document management system.  Index fields provide faster retrieval, especially if you want to be able to retrieve through specifying several criteria.  Relying on OCR, or the recognized text can get you in trouble.  First of all, you are assuming that the document will alwyas have recognized text, and that all the items that you are searching for are in the text.  Secondly, depdning on the type of OCR format you have, you may have to just find the document, and then open and parse what you are looking for.  This can also lead to false positives in retrieval if many documents have the same terms in their OCR text.