Tuesday, February 23, 2010

OCR Software and Character Correction

Optical Character Recognition and Character Correction

So what is character correction when associated with OCR?  The OCR process provides the recognition and conversion of images to text, and in this process, there can be many characters that can be misidentified throughout the conversion process.  Typically, document capture applications provide the ability to identify commonly misinterpreted characters through a table of correction mappings.  So lets say a particular zone OCR field was designated as numbers only, and the engine interpreted an "l" for a "1" (that is an l for a one).  The correction piece of the recognition engine can provide logic to the OCR process, and make sure the text is properly interpreted. This can be really important, especially in SharePoint OCR environments where you need searchable PDFs in SharePoint.

This is just one of many ways to improve accuracy, but note you will need the right kind of OCR application that allows this feature to be enabled.

