by Mark Gavin
For the most part; the basic layout of a PDF file can be fairly simple. A PDF file consists of four primary sections as illustrated below:
The PDF file “Header” is just one or two lines starting with %PDF. The “Body” is a collection of objects which include the page contents, fonts, annotations, etc. The “xref Table”, or cross reference table, is a collection of pointers to locate the individual objects contained in the “Body”. The “Trailer” contains the pointer to the start of the cross reference table.
Starting with the basic layout above; PDF supports the concept of incremental saves. This is the ability to make modifications to the file without altering the actual content of the original saved document.
There are several advantages to incremental saves.
- Saving the file to disk is quicker because you are only tacking the new data to the end of an existing file.
- An incrementally saved document contains an audit trail of changes to the PDF file. This allows the file to be “rolled back” to a previous save.
- The incremental save mechanism is also used to support multiple digital signatures on a single PDF file.
There is also a significant disadvantage to the incremental save mechanism. Selecting “Save” under the Acrobat file menu automatically does an incremental save. When PDF documents are edited; for example, when the user add form fields or comments, the document is typically “saved” multiple times. This leads to file size increase, because the unused or obsolete data remains in the PDF file.
To remove the unused data in an incrementally saved PDF file an Acrobat user needs to perform a “Save As…”. We have seen cases where a 200 KB PDF file increased in size to over 2.5 MB due to incremental saves. In these cases, a simple “Save As” can result in dramatic file size reductions.
The basic file layout becomes more complex with files which have been saved for “Fast Web View”. This is called linearized. For more information see the following blog entry: Linearization