When we scan documents, they are just images, pictures of our paper. For many organizations, this scanned image is exactly what they need, and a little index information about the document is sufficient to provide them with retrieval capability.
So why take the time and spend the money to utilize OCR Software to convert the scanned document to a searchable format? Below are some reasons to always perform full text OCR of scanned documents:
- Always provide every means possible for retrieval. Just using index fields to search for scanned documents may seem like a fantastic idea, but what if the document is misidentified? Or the indexer enters incorrect information? Performing a full text OCR of the document can provide an insurance policy that a document can always be found through full text search.
- Document Capture software today provides fast reliable OCR. Most capture software on the market provides the ability to automatically convert the documents to searchable format for a small expense. Some of the engines on the market can do the conversion at 100+ pages per minute, so there is really not much time wasted in the OCR conversion / recognition process.
- OCR to PDF for a format that contains both image and text in one container. Adobe provides the PDF image with hidden text option to give you a seachable file format that contains a pristine image.
- Plan for the worst case. Audits...legal issues...sometimes you need to search beyond the index fields, and full text can give you the ability to find the needle in the haystack.