Document Conversion Utility

The document conversion utility converts documents between differing formats.

To install to the utility refer to the documentation tab.

The tool specifically addresses a need to create full text searchable PDF (via OCR) documents from imaging formats.

A limited range of source formats are currently available.  TIF and PDF.  This, however, will addresses the vast majority of requirements for full text PDF creation.

PDF and TIFF are not simple imaging formats and may internally contain a number of differing image compression formats.  It is likely that DCU will not work with the more complex or esoteric formats.  You may wish to share links to any problematic files for community support.

For further details on installation and usage please refer to the documentation tab.

If you experience problems using DCU please start a discussion on the discussions tab.

 Free OCR Tool  Yes.  DCU will OCR documents for free.
 Free TIF to PDF Conversion  Yes.  DCU will convert TIF to PDF.  The PDF's can be full text searchable.
 Tesseract  Yes.  DCU leverages the Tesseract OCR engine
 HOCR to PDF  No but as DCU can convert to a full text searchable PDF it is unlikely that this will be required.
 Bulk Conversion Yes.   The application allows you to specify source and destination directories and a conversion type.  The files will be converted and placed in the destination directory.  The source files will be moved to a subdirectory of the source folder indicating if they succeed or failed conversion.
 API / SDK Yes.  Programmers may leverage the included DCH.dll (Document conversion helper) to implement their own applications more easily. 
 SharePoint Yes.  The produced full text PDF documents can be indexed and searched by SharePoint.  Depending upon the version of SharePoint you will need an Adobe PDF iFilter. 
Accurate OCR Yes.  Tesseract is an accurate OCR engine.
Multiple language support. Yes.  See http://code.google.com/p/tesseract-ocr/
   

Last edited Jul 21, 2014 at 4:15 PM by MadAboutImport, version 9