PDF Basic File Layout

Jun 12, 2007 12:00:00 AM | Learning PDF PDF Basic File Layout

Description of the four primary sections of the basic layout of a PDF file and an explanation of the incremental save mechanism.

by Mark Gavin

For the most part; the basic layout of a PDF file can be fairly simple.  A PDF file consists of four primary sections as illustrated below:

Sections are Header, Body, xref Table, Trailer.

The PDF file “Header” is just one or two lines starting with %PDF.  The “Body” is a collection of objects which include the page contents, fonts, annotations, etc.  The “xref Table”, or cross reference table, is a collection of pointers to locate the individual objects contained in the “Body”.  The “Trailer” contains the pointer to the start of the cross reference table.

Incremental Saves

Starting with the basic layout above; PDF supports the concept of incremental saves.  This is the ability to make modifications to the file without altering the actual content of the original saved document.

Sections are Original File (Header, Body, xref Table, Trailer), First Incremental Update (Body Changes, Updated xref, Trailer), etc., Last Incremental Update (Body Changes, Updated xref, Trailer).

There are several advantages to incremental saves.

  1. Saving the file to disk is quicker because you are only tacking the new data to the end of an existing file.
  2. An incrementally saved document contains an audit trail of changes to the PDF file.  This allows the file to be “rolled back” to a previous save.
  3. The incremental save mechanism is also used to support multiple digital signatures on a single PDF file.

There is also a significant disadvantage to the incremental save mechanism.  Selecting “Save” under the Acrobat file menu automatically does an incremental save.  When PDF documents are edited; for example, when the user add form fields or comments, the document is typically “saved” multiple times.  This leads to file size increase, because the unused or obsolete data remains in the PDF file.

To remove the unused data in an incrementally saved PDF file an Acrobat user needs to perform a “Save As…”. We have seen cases where a 200 KB PDF file increased in size to over 2.5 MB due to incremental saves. In these cases, a simple “Save As” can result in dramatic file size reductions.

The basic file layout becomes more complex with files which have been saved for “Fast Web View”.  This is called linearized.  For more information see the following blog entry: Linearization

This article is translated to the Serbo-Croatian language by Jovana Milutinovich fromWebHostingGeeks.com.

 

Mark Gavin

Written By: Mark Gavin

Appligent Chief Technology Officer and software architect. Mark invented PDF redaction in 1997 and is also the creator of several other first-ever PDF applications, including Appligent’s SecurSign and FDFMerge, EMC’s Documentum IRM for PDF, and Liquent’s CoreDossier.