Appligent Labs

HP Smart Document Scan | Appligent Labs

Written by Mark Gavin | Feb 25, 2008 5:00:00 AM

by Mark Gavin

Recently we have received a couple of malformed PDF files produced by the HP Smart Document Scan software.  It appears that the HP Smart Document Scan software is only included with the HP Scanjet 7800 Scanner and the  HP Scanjet 8350 & 8380 scanners.

The version number of the PDF files produced is PDF 1.0. The first problem we found is located in /Name objects which contain ‘#xx’ hex values.  The use of # hex values were not part of PDF 1.0;  # hex values in Name objects were introduced in PDF 1.2.

The second problem is fairly nasty.  Apparently, within the HP Smart Document Scan software the programmer is using the path to the image file as the Name of the image XObject when creating the PDF file.  On some systems; a file path can be quite long.  For long path names; the HP software simply truncates the path at 127 characters.  If the path contains a space character it is placed in the Name object as ‘#20’.

The PDF 1.2 Reference states “Any character except null (<00>) may be included in a name by writing its two-character hex code, preceded by #.”  It is permissible for hex strings contained within angle brackets (<>) to have an odd number of hex characters; but, # hex character within name objects must always be two hex characters.

Because the extremely long XObject name uses a file path, coupled with truncating the file path at 127 characters; the resulting PDF file can sometimes be malformed.  Following is an example of what is sometimes produced:

/T:\Everyone\OXB#20TEST\FilePath\ … … 8\Feb#2020,#2

I have removed the center of the name object for clarity.  The name object above was truncated between two hex digits.  At the end of the name you can see ‘#2’; I’m assuming this should be ‘#20’. Because this is now an invalid hex character at the end of a name object; it produces the error “non-hex character in a hex string”.

This is very easy to locate within the HP files by simply opening the file using a text editor.

When creating names for XObjects; it is common practice to use a short unique string followed by an incremental counter.  This results in each XObject in the file having a unique name.

Using the file path as the XObject name is a bad practice; especially where very long path names can be truncated.  This practice does not guarantee that all XObjects in the file will have unique names.

To avoid problems with the PDF files created by this software; end users working with the HP Smart Document Scan software should ensure the paths to the scanned image files are far less the 127 characters.