Showing posts with label open source OCR. Show all posts
Showing posts with label open source OCR. Show all posts

Saturday, February 13, 2010

Why use OCR Software to perform full text conversion of images?

OCR Software

When we scan documents, they are just images, pictures of our paper.  For many organizations, this scanned image is exactly what they need, and a little index information about the document is sufficient to provide them with retrieval capability.

So why take the time and spend the money to utilize OCR Software to convert the scanned document to a searchable format?  Below are some reasons to always perform full text OCR of scanned documents:

  1. Always provide every means possible for retrieval.  Just using index fields to search for scanned documents may seem like a fantastic idea, but what if the document is misidentified?  Or the indexer enters incorrect information?  Performing a full text OCR of the document can provide an insurance policy that a document can always be found through full text search.
  2. Document Capture software today provides fast reliable OCR.  Most capture software on the market provides the ability to automatically convert the documents to searchable format for a small expense.  Some of the engines on the market can do the conversion at 100+ pages per minute, so there is really not much time wasted in the OCR conversion / recognition process.
  3. OCR to PDF for a format that contains both image and text in one container.  Adobe provides the PDF image with hidden text option to give you a seachable file format that contains a pristine image.
  4. Plan for the worst case.  Audits...legal issues...sometimes you need to search beyond the index fields, and full text can give you the ability to find the needle in the haystack.
OCR applications give you the means and capabilities to convert images to searhcable formats and there are many reasons to do the full text conversion.

Monday, December 7, 2009

Open Source OCR Software

Open Source OCR Software

The open source  movement has created some great OCR Software / Optical Character Recognition Software.  Below are links and info:

OCRopus OCR Software
This is a project sponsored by Google, and is a state of the art OCR application.  It is focused on high volume OCR needs, and includes a conversion engine, layout analysis, modeling and multi-lingual capabilities.

OCRopus OCR Software Download

GOCR OCR Application
Developed under the GNU Public License, is can be used with various front ends to convert immages to text, and is open to different image formats.

GOCR OCR Application Download

Tesseract OCR Engine
Engine developed by HP in the late 80s when OCR Software was in its infancy.  Google uses the engine in its OCRopus.  Document Capture companies like PSIGEN have made the Tesseract Engine an option for afvanced capture.

Tesseract OCR Engine Download