Sunday, December 27, 2009

How do I pick the right OCR Software?

In the space of OCR Software, or Optical Character Recognition, it can be confusing to say the least on which option you should pick.  It really comes down to the use case, or how you will utilize the software.  Below are some great question to ask your self:

What do I need to convert with my OCR Software? 
This question is very important, and it really comes down to what you are looking to output with your software.  Do you want a word file that you can edit, or are you just looking to create a searchable PDF?  Many engines are tuned for accuracy, and will give you the best formatted output, others are built for speed.  Omni-page is an excellent engine for creating nicely formatted output, but can be rather slow due to its focus on acuracy.  A production engine, like PSI:Capture, which offers multiple OCR choices, can give you great flebility, no matter your ouput choice.

Are they pre-existing images, or ones that I will scan?  PDFs or TIFFs?
It is really important when you are choosing Optical Character Recognition Software, to make sure that you have all the functionality you require, whether you are scanning, or just processing non-searchable PDFs from a directory.  Most of the OCR Software will let you choose the file that you perform recognition on, and others will let you scan in paper for conversion.  If you are utilizing MFPs or Scanning copiers, and want to perform OCR on the scanned documents, you may want to choose a product that performs auto-import, or one that is focused on MFP Scanning.  Also, you want flexibility in the types of file you can process, and want to be able to OCR any image type:  PDF, TIFF, JPG, GIF, BMP, etc.
How fast can I do conversions?
So, some engines are built for OCR Accuracy, others built for speed in the OCR process. Most of the desktop engines, like eCopy Desktop, provide a good mix of both.  Other engines, like Glyphreader or Docustar, provide the ability to choose whether you want speed or accuracy in your OCR results.  It is always good to choose a document capture option that allows you multiple OCR engine options to perform diffferent recognition tasks.

How ddo I get the best accuracy in the OCR ouput?
All of the OCR Software mentioned within this post reuires a high quality image for the best recognition accuracy.  With that said, a high quality scanning software with image processing options will lead to the best OCR accuracy when converting from image to text.  So what does image processing have to do with OCR Software?  The cleaner the image, the better the accuracy, and if you can deskew, despeckle, deshade and sharpen text, you will get better OCR results.

