OCR Software: scanning

Showing posts with label scanning. Show all posts

Sunday, June 10, 2012

OCR and SharePoint: What features do I need?

As many organizations go down the road to place scanned documents into SharePoint, there are several areas of key focus. A little planning will help to leverage OCR technology, and pre-OCR documents before they are placed in a SharePoint library as PDFs. So what is the true value of OCR in any SharePoint deployment? It all depends on what you are trying to achieve. The Scanning with SharePoint BLOG has a great post on what to evaluate before you start the scanning process: How do you want to find your documents in SharePoint? Below are some ways to utilize OCR, and some definitions of key types:

Full Text OCR - Optical Character Recognition for SharePoint, or OCR is typically associated with conversion of an image to full text. When you scan a document, it is a pure image, and the text within is not searchable, nor can you copy and paste. The OCR process can give you pdfs that can be indexed by SharePoint Search. Is Full Text OCR Necessary? Read the link for some thoughts.
Zone OCR - Zone OCR can be utilized to extract information from a specific location on a repeatable form. The information collected can be automatically entered into a SharePoint column. This is a huge time save if you need to automatically collect information from a large volume of forms, and Optical Character Recognition by zone can really help speed up the process.
Advanced Data Extraction (ADE) - This is the ultimate in efficiency and automation, and only a few apps give you this OCR functionality without an exorbitant cost. In a nutshell, ADE provides pattern matching for information extraction. So if you are looking for a 6 digit number, it auto-extracts this information. During the OCR process, ADE adds to accuracy and speed by finding only what you need. inForm has a great product for SharePoint Capture and OCR that can provide a robust ADE engine.
Point and Click OCR - Point and Click OCR allows you to use the mouse to choose what you want to throw into a SharePoint field. The images are pre-OCR'd or the process is performed real time to give you the desired information.
Rubberband OCR - this method of OCR processing allows you to drag your mouse over an area of text and auto-enter the data into a SharePoint column. It is great for information that spans multiple lines, and can convert the text in the image quite easily.

Saturday, January 30, 2010

OCR Software versus Document Capture Software

OCR Software versus Document Capture Software

So all OCR Software companies provide the ability to convert scanned files into text or searchable PDFs via the Optical Character Recognition process, but how do I capture/scan the images so the applications can do their conversion?

This is an interesting question. Let's talk about Document Capture first. This type of application is built from the ground up to scan/capture documents at a high rate of speed, provide the means to collect information about the documents through a number of means, and then export the document/data to a back end repository. All document capture companies provide all types of OCR options, and usually OEM their OCR, ICR, OMR components from the major OCR application companies, like: ABBYY, OpenText, Nuance, ReadSoft, etc. Most of these companies have diversified their offering to include document capture, but their offerings far way short on the capture side in my opinion...they are OCR companies.

The real goal here is to get the best OCR results possible through a powerful OCR engine, and also minimize your time required to scan and process through the best document capture software. So, if you are looking to do high volume OCR processing, I highly recommend choosing a capture application that utilizes your OCR engine of choice to get the best of both worlds. I will write more on this topic in upcoming posts. If you want some guidance on How to pick the right OCR Software, click on the link text.

Wednesday, December 30, 2009

Optical Character Recognition (OCR) and Capture

Optical Character Recognition (OCR) and Capture

So what is document capture software and what does it have to do with OCR applications. So, I think first, we need to differentiate between scanning software and capture software. Here is a good blog post that goes over the differences, with regards to SharePoint Scanning. Scanning Software just gives you the ability to convert paper to a digital form, and then OCR. Capture Software takes this a step further, and is really a catalyst for some enhanced processing with your recognition engine. Typical capture software will allow you to perform zone OCR, scan multiple documents in a single stack through separation, perform OCR based separation or even analyze the OCR text for expressions and then automatically extract the data. Document Capture software provides enhanced data extraction, as an example, as do other vendors like Kofax, AnyDoc, Captiva, etc.

So, I guess the whole point here is that OCR software in most cases just provides a basic framework for the conversion process. you really need a capture application to harness the true power of any OCR or recognition engine.