PDF Viruses Part 1; This is not a glitch

May 18, 2011 12:00:00 AM | Content Stream PDF Viruses Part 1; This is not a glitch

A discussion of embedded viruses and malware inside PDF files with references to PDF Standards relevant to the subject. Continued in PDF Viruses Part 2.

by Mark Gavin

There have been a few posts recently related to embedding viruses and malware inside PDF files. Jiri Sejtko and Jindřich Kubec on the Avast Blog did a good job of describing the mechanics of a recent exploit which compressed a malicious XFA array object within a /FlateDecode /JBIG2Decode filter set. They then takes the position that PDF and Adobe are bad because this type of compression is allowed on any data and not just image data.  Jiri then goes on to say “I’m not happy to see another trick based on a glitch in the PDF specification.”

Jiri references Section 4.8.6, page 353 of the old PDF Reference 1.7 and uses the following quote from the text to justify his position:

“Also note that JBIG2Decode and JPXDecode are not listed in Table 4.44 because those filters can be applied only to image XObjects.”

Unfortunately, the referenced quote is not related to the topic under discussion.  Table 4.44 is simply a list of filter name abbreviations which are only valid for “Inline Images”.  The discussion in Jiri blog posting is related to filters used for object level compression; however, this reference is discussing why JBIG2Decode and JPXDecode were excluded from a list of abbreviations used by Inline Images.  Following is the full quote:

“Table 4.44 shows additional abbreviations that can be used for the names of color spaces and filters. Note, however, that these abbreviations are valid only in inline images; they may not be used in image XObjects. Also note that JBIG2Decode and JPXDecode are not listed in Table 4.44 because those filters can be applied only to image XObjects.”

The text quoted above is not part of ISO 32000-1; but, the following text can be found in ISO 32000-1 Section 8.9.7 Inline Images.  Table 94 in ISO 32000-1 corresponds to Table 4.44 in the PDF Reference 1.7:

“These abbreviations are valid only in inline images; they shall not be used in image XObjects. JBIG2Decode and JPXDecode are not listed in Table 94 because those filters shall not be used with inline images.”

Inline Images are typically small images where the image data is embedded directly into a page content stream just like other drawing operators. Image XObjects are document level objects which can be referenced by one or more page content streams.

A discussion of Filters used in PDF files is located in ISO 32000-1 Section 7.4 Filters.  An excerpt from the Section 7.4.1 reads as follows:

“Filters may be cascaded to form a pipeline that passes the stream through two or more decoding transformations in sequence. For example, data encoded using LZW and ASCII base-85 encoding (in that order) shall be decoded using the following entry in the stream dictionary:

EXAMPLE 2 /Filter [/ASCII85Decode /LZWDecode]”

Section 7.4.1 Note 2 also states the following:

“Though somehow obvious it might be worth pointing out that lossy compression can only be applied to sampled image data (and only certain types of lossy compression for certain types of images).  Lossless compression on the other hand can be used for any kind of stream.”

Jiri, in the blog post references the PDF Reference 1.7 (2006); which is not a standard; and, is not current. The technical standard for the PDF file format as defined by the International Organization for Standards is ISO 32000-1; which can be downloaded for free from the Adobe web site. There is no reason any new postings or articles should be make reference to any of the old “PDF Reference” 1.x documents.

ISO 32000-1 explicitly states “Lossless compression … can be used for any kind of stream.” This is not a “glitch in the PDF specification.” This is how PDF works. Though Jiri and Jindřich’s methods are sound; they may have less to quibble about if they were actually reading the correct documentation.

More to come in PDF Viruses Part 2; Why is this a surprise?

Mark Gavin

Written By: Mark Gavin

Appligent Chief Technology Officer and software architect. Mark invented PDF redaction in 1997 and is also the creator of several other first-ever PDF applications, including Appligent’s SecurSign and FDFMerge, EMC’s Documentum IRM for PDF, and Liquent’s CoreDossier.