PDF Cross Reference Table

by Mark Gavin

The PDF Cross Reference Table (xref) is the third major section of a PDF File.  Please refer to the PDF Basic File Layout.  The xref is the index by which all of the indirect objects, in the PDF file, are located.  A single PDF file can contain multiple xref tables if the file has been incrementally saved or linearized.

Typically, the PDF cross reference table will have the following form.  

The cross reference table starts with the word "xref".  In the above example; all of the data following the word "xref" is an "xref subsection".  An xref can contain more then one xref subsection.

The first line of the xref subsection contains two numbers. The first number is the numerical ID of the first object in the this xref subsection.  The second number is the count of objects in this xref subsection.

The remainder of the data in the xref subsection contains a sequence of lines which represent three types of data associated with each PDF indirect object as follows:

1. The location of the object specified using the byte offset to the object from the beginning of the PDF file.

2. The generation number of the object.  

3. A flag defining if this specific object is in use or free.

Note: Each line of the second portion of the xref subsection MUST be exactly 20 bytes long; including the line ending bytes.

Copyright 2012 by Appligent, Inc.