Tuesday, January 26, 2010

Microsoft SharePoint and OCR

Microsoft SharePoint and OCR

Scanning with Microsoft SharePoint is an interesting endeavor, and typically the main reason for this undertaking is to have a searchable body of information.  So what type of Optical Character Recognition (OCR) Software can be utilized with SharePoint?   First of all, all the same rules apply in picking the right recognition software to do the conversion from image to text, as outlined in "How do I pick the right OCR Software?".  You need to evaluate what you are trying to accomplish and look at your business process and workflows to get a good idea of how to initiate the conversion process:

Are your paper images scanned en masse, through a centralized capture process?

If this is the case, you would typically do all of your OCR processing and recognition in front end document capture software.  These application provide the fastest OCR engines, and their recognition processing time can be anywhere from 100-600 pages per minute, depending on the types of pages you are scanning. 

Do you utilize MFPs / Copiers to scan document to sharepoint?

Most companies are trying to leverage their investment in their copier hardware to provide end users a great scanning and capture onramp to SharePoint.  In this case, you typically want an OCR application that can provide recognition on the fly, and do the conversion process behind the scenes.  Their are many MFP integrated applications on the market that can provide the OCR engine: PSIGEN PSI:Capture, NSI AutoStore, eCopy to name a few.

Do the end users compile, combine and work with documents at their desktops?

In environments where end users are constantly working in their documents, and need desktop scanning access, typically and OCR Desktop application can be the best solution.  These applications can put the control of the conversion process in the end user's hands, and can provide them OCR capability at the click of the mouse.  Some apps in this class are eCopy Paperworks, PaperPort and OmniPage.

All of the OCR Solutions on this page focus on doing the process before the documents hit SharePoint.  I will write an article later on solutions that can OCR documents within SharePoint Libraries later.

5 comments:

  1. Is there a way to have authentication done on the machine carry into Sharepoint? The goal is for the "modified by" column to be populated by the authentication information entered at the MFP?

    ReplyDelete
  2. In regards to OCR applications, I recommend checking out the free beta version of Ricoh Innovation's software. It's available online at: http://beta.rii.ricoh.com/betalabs/content/document-conversion

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Hi,
    Nice information regarding OCR.How to implement OCR in Document library in sharepoint.Thanks in advance

    Regards
    Imran Paracha

    ReplyDelete
  5. you ready to right that next article on how to OCR within SharePoint?

    ReplyDelete