OCR and SharePoint: What features do I need?
As many organizations go down the road to place scanned documents into SharePoint, there are several areas of key focus. A little planning will help to leverage OCR technology, and pre-OCR documents before they are placed in a SharePoint library as PDFs. So what is the true value of OCR in any SharePoint deployment? It all depends on what you are trying to achieve. The Scanning with SharePoint BLOG has a great post on what to evaluate before you start the scanning process: How do you want to find your documents in SharePoint? Below are some ways to utilize OCR, and some definitions of key types:
- Full Text OCR - Optical Character Recognition, or OCR is typically associated with conversion of an image to full text. When you scan a document, it is a pure image, and the text within is not searchable, nor can you copy and paste. The OCR process can give you pdfs that can be indexed by SharePoint Search. Is Full Text OCR Necessary? Read the link for some thoughts.
- Zone OCR - Zone OCR can be utilized to extract information from a specific location on a repeatable form. The information collected can be automatically entered into a SharePoint column. This is a huge time save if you need to automatically collect information from a large volume of forms, and Optical Character Recognition by zone can really help speed up the process.
- Advanced Data Extraction (ADE) - This is the ultimate in efficiency and automation, and only a few apps give you this OCR functionality without an exorbitant cost. In a nutshell, ADE provides pattern matching for information extraction. So if you are looking for a 6 digit number, it auto-extracts this information. During the OCR process, ADE adds to accuracy and speed by finding only what you need. PSIGEN has a great product for SharePoint Capture and OCR that can provide a robust ADE engine.
- Point and Click OCR - Point and Click OCR allows you to use the mouse to choose what you want to throw into a SharePoint field. The images are pre-OCR'd or the process is performed real time to give you the desired information.
- Rubberband OCR - this method of OCR processing allows you to drag your mouse over an area of text and auto-enter the data into a SharePoint column. It is great for information that spans multiple lines, and can convert the text in the image quite easily.