PDF Compression Options

by Mark Gavin

Appligent applications support a variety of file compression options.  Following is a more detailed description of the types of PDF file compression capabilities available.

Flate – Encode non-encoded streams using Flate compression.  Flate is basically Zip compression; so, any content streams which are not compressed will be compressed using Flate.

No LZW – Lempel–Ziv–Welch (LZW) is a lossless compression used extensively throughout the early Internet and in early PDF files. The use of LZW fell out of favor because of licensing issues with Unisis, the owner of the LZW patent. A good history of LZW is available on Wikipedia. This replaces all LZW encoded streams with Flate.

Remove ASCII-85 – An older style encoding which allowed early PDF files to be transferred and stored on systems which used 7 bit ASCII. This type of encoding is no longer necessary and actually expands the size of the content streams.

Optimize Fonts – Merge identical font descriptors and encodings. If different pages in the input document use exactly the same set of glyphs, then optimize fonts should consolidate all the identical FontDescriptors into a single object.

Optimize XObjects – Merge identical forms and images, as determined by an MD5 hash of their contents. See PDF Object Types for more information.

Optimize Content – Look for common initial sub-sequences among content streams (the sequences of marking operators), and generate substreams that can be shared among the document page content.

Compress Object Streams – A PDF file is a collection of objects. Classically, objects can contain data that can be compressed; but, the objects themselves were not compressed.  Compress Object Streams takes a collection of PDF objects and compresses the entire collection of objects into one compressed stream. This capability was introduced in PDF 1.5 (Acrobat 6) and can significantly reduce file size.

Compress Structure – Compress only those objects that are related to logical structure (for example, tagged PDF). The result is compatible with any version of PDF or Acrobat, but the compressed objects are usable only in PDF 1.5 (Acrobat 6).

XMP Read Only – Suppresses the generation of padding in the XMP metadata. Normally, XMP metadata inside the PDF file contains a bunch of blank space just in case anyone wants to add additional metadata to the file. In reality, this blank space is rarely, if ever, used, and not adding this blank space just makes the file a bit smaller.

Rewrite Page – Rewrite page contents to cleanup at the page content level.  This option rebuilds the entire contents of the page eliminating many types of malformed content.

Linearization – While not strictly compression, linearization on saving for Fast Web View in general does tidy up the file a bit. In addition, it is very useful for streaming PDF file content across the web making display of the PDF file appear more responsive to the end user. See PDF Linearization for more information.