OCR Software: Optical Character Recognition Software

Showing posts with label Optical Character Recognition Software. Show all posts

Saturday, January 30, 2010

OCR Software versus Document Capture Software

OCR Software versus Document Capture Software

So all OCR Software companies provide the ability to convert scanned files into text or searchable PDFs via the Optical Character Recognition process, but how do I capture/scan the images so the applications can do their conversion?

This is an interesting question. Let's talk about Document Capture first. This type of application is built from the ground up to scan/capture documents at a high rate of speed, provide the means to collect information about the documents through a number of means, and then export the document/data to a back end repository. All document capture companies provide all types of OCR options, and usually OEM their OCR, ICR, OMR components from the major OCR application companies, like: ABBYY, OpenText, Nuance, ReadSoft, etc. Most of these companies have diversified their offering to include document capture, but their offerings far way short on the capture side in my opinion...they are OCR companies.

The real goal here is to get the best OCR results possible through a powerful OCR engine, and also minimize your time required to scan and process through the best document capture software. So, if you are looking to do high volume OCR processing, I highly recommend choosing a capture application that utilizes your OCR engine of choice to get the best of both worlds. I will write more on this topic in upcoming posts. If you want some guidance on How to pick the right OCR Software, click on the link text.

Sunday, December 13, 2009

What is Zone OCR?

What is Zone OCR?

Zone OCR Software provides the ability to focus in on just a single, or multiple, sections (zones) of a scanned document or image. Converting specific zones to text is an important optical character recognition feature set, and one that can be applied in just about any business type. Its main use is to harvest values from images, and utilize them as index values, to provide search capability later. Not all zone OCR engines are equal, and you typically need a very accurate engine to produce the required results. Some accurate engines include Glyphreader, Recostar, Docustar and many others.

It is often imperative to "clean up" the zone prior to attempting the conversion to text. Clean up can include line removal, despeckle, deskew, etc., which are found in almost any product that provides OCR and Image Processing features.

Monday, December 7, 2009

Open Source OCR Software

Open Source OCR Software

The open source movement has created some great OCR Software / Optical Character Recognition Software. Below are links and info:

OCRopus OCR Software
This is a project sponsored by Google, and is a state of the art OCR application. It is focused on high volume OCR needs, and includes a conversion engine, layout analysis, modeling and multi-lingual capabilities.

OCRopus OCR Software Download

GOCR OCR Application
Developed under the GNU Public License, is can be used with various front ends to convert immages to text, and is open to different image formats.

GOCR OCR Application Download

Tesseract OCR Engine
Engine developed by HP in the late 80s when OCR Software was in its infancy. Google uses the engine in its OCRopus. Document Capture companies like PSIGEN have made the Tesseract Engine an option for afvanced capture.

Tesseract OCR Engine Download