Mark Gavin's PDFblog
This blog contains items I find interesting or useful primarily related to Portable Document Format (PDF).
Build Better PDF Solutions
Mark Gavin's PDFblog
This blog contains items I find interesting or useful primarily related to Portable Document Format (PDF).
The PDF Cross Reference Table (xref) is the third major section of a PDF File. Please refer to the PDF Basic File Layout. The xref is the index by which all of the indirect objects, in the PDF file, are located. A single PDF file can contain multiple xref tables if the file has been incrementally saved or linearized.
Typically, the PDF cross reference table will have the following form.

The cross reference table starts with the word "xref". In the above example; all of the data following the word…
Duff Johnson recently wrote a fairly nice introduction to using command line applications in TalkingPDF. The article contains several interesting links including a link to a novel introduction to the engineering design behind Google Chrome.
The XCode debugger will not display ASFixed numbers correctly by default. You need to add a Custom Data Formatter which contains the instructions to the XCode debugger on how to properly format an ASFixed number for display. These instructions are in the form of an XML file named CustomDataViews.plist; located in the following path:
(~/Library/Application Support/Apple/Developer Tools/CustomDataViews/CustomDataViews.plist)
Following is a screenshot of how the ASFixed number "100" is normally displayed…
Learning Postscript: A Visual Approach
Ross Smith
The single best book written on learning Postscript is "Learning Postscript: A Visual Approach" written by Ross Smith.
ISBN 978-0938151128
This book was published in March of 1990 and has been out-of-print for a long time. But through the magic of the internet this masterpiece is easily available.
To the left is a link on Amazon.com. You can also find the book at AbeBooks.com. AbeBooks is my favorite place to find out-of-print books, used books and rare books.
Postscript programming was fairly easy for me…
In the past year I have taught an Acrobat SDK training course a few times. During these courses; I noticed many programmers; especially younger programmers, have not been exposed to Fixed Point Mathematics. When working with the Acrobat SDK; Fixed Point Math is very important; because, many of the SDK calls expect the caller to be passing numerical information using Fixed Point data types like ASFixed, ASFixedPoint, ASFixedRect, ASFixedQuad and ASFixedMatrix.
The Acrobat SDK uses the ASFixed data type…
The PDF Universal Accessibility working group has received an ISO "Approved Work Item" number from the International Standards Organization (ISO). PDF/UA is now ISO/AWI 14289.
As PDF/UA continues to more through the standards process; the eventual standard will be labeled ISO 14289.
We have developed a BBEdit Language Module for PDF. This language module is written to aid developers and support personnel in reading raw PDF files by highlighting specific elements of the file with syntax coloring.

The meaning of the colors are as follows:
blue - PDF keywords
red - arrays
green - dictionaries
purple - strings
light gray - stream data
PDFHilight is a Universal Binary plug-in built for BBEdit 9.
Installation: To install the plug-in, make sure BBEdit isn't running,…
Recently I have received several PDF documents which contain compressed object streams. Object Streams are described in section 7.5.7 of ISO-32000-1; and, are a mechanism of storing a collection of indirect Cos Objects together inside of a Cos Stream.
This Cos Stream; of Type "ObjStm" may or may not be compressed. Though, it would be pointless not to compress the stream.
Object Streams became available starting with PDF 1.5; and, therein lies the reason for this blog posting. Every one of the files…
The unofficial ISO 32000-1 PDF Standard document is now available as a free download from Adobe. The body text of the document is the same as the official ISO 32000-1 standard; but, page headers and footers have been changed to replace the ISO copyright with the Adobe copyright.
Portable Document Format (PDF) is now officially an international standard. The International Organization for Standards (ISO) has published ISO 32000-1:2008, Document management – Portable document format – Part 1: PDF 1.7. This is an ISO standard based on PDF 1.7. Here is a link to the ISO press release.
Work is currently underway for the development of ISO 32000-2. To participate in the development of the next version of PDF; get involved. In the United States; the PDF Reference Committee is managed…
Otherwise known as AcroForms; Acrobat form technology was first introduced in PDF version 1.2; and, has been around for more then ten years. In addition to Adobe Acrobat; there are third parties which have released products to create Acrobat forms.
Following is a list of tools to create AcroForms:
Nuance PDF Converter Professional Versions 4 or 5
The Acrobat Professional package includes tools to create documents using two…
A Forms Data Format ( FDF ) file is a text file that contains a list of form field names and their values. Acrobat Forms, or AcroForms, were introduced in PDF Version 1.2. To allow for the import and export of data from AcroForms; Adobe developed the Forms Data Format. The documentation for the Forms Data Format is located in the PDF Reference in the chapter on "Interactive Features" under the section "Interactive Forms".
There are two kinds of FDF files:
• Classic - supplies data to fill out an existing…
The Visual Display of Quantitative Information, 2nd edition
Edward R. Tufte
Last week while in Boston for the AIIM Conference; I used the Monday before the conference to attend a one day course taught by Edward Tufte on "Presenting Data and Information". The course focuses on effectively presenting and communicating information.
The course is given in various locations around the country throughout the year. I've known about the course for the past several years; but, until last week the scheduling didn't work out to make it convenient for me to attend.
I found the course to be…

Recently we have received a couple of malformed PDF files produced by the HP Smart Document Scan software. It appears that the HP Smart Document Scan software is only included with the HP Scanjet 7800 Scanner and the HP Scanjet 8350 & 8380 scanners.
Linearization
Linearization is a variant on the PDF file layout as described previously. Linearization is also called "Fast Web View". Linearization shuffles the contents of the PDF file to place all of the information needed to display the first page near the beginning of the file.

This allows the user to see the first page while the remainder of the file is still downloading from the web.
Incremental saves on a linearized file can actually break linearization; but, Acrobat still reports the file as…
Jim King, Principal Scientist at Adobe Systems has a personal web site which contains a collection of his public presentations. These presentations include PDF Tutorials, Color Management, Color Science, XML/PDF Tutorial and High Resolution Rendering. Several of the presentations are annotated with speakers notes. I would encourage everyone to check it out. The URL to the presentations is as follows: http://home.comcast.net/~jk05/presentations/
A Typical PDF File
For the most part; the basic layout of a PDF file can be fairly simple. A PDF file consists of four primary sections as illustrated below:

The PDF file "Header" is just one or two lines starting with %PDF. The "Body" is a collection of objects which include the page contents, fonts, annotations, etc. The "xref Table", or cross reference table, is a collection of pointers to locate the individual objects contained in the "Body". The "Trailer" contains the pointer to the start of the…
We received an email from one of our customers, who is an attorney, who uses Bates numbering on a regular basis. Following is one of the sentences from this customers email:
"I wouldn't have thought it possible, but Adobe has managed to implement its Bates-stamping in a manner which makes it virtually useless [or at least highly impractical for use by] attorneys, the primary users of Bates-stamp utilities."
When I saw this I decided to take a look at Acrobat Bates Numbering.
I really don't use most of the…
Following is a collection of screen shots taken using a single PDF file displayed under Acrobat 4 through Acrobat 8.
Acrobat 4

Acrobat 5

Acrobat 6

Acrobat 7

Acrobat 8

Following is a PDF file which demonstrates the text shifting problem:
This particular drawing error is caused by passing a large negative character spacing in a text array when the text is of zero length.
PostScript(R) Language Reference (3rd Edition)
Adobe Systems Inc.
PostScript(R) Language Program Design
Adobe Systems Inc.
PostScript(R) Language Tutorial and Cookbook
Adobe Systems Inc.
The Adobe PDF Reference is similar to the Adobe Postscript Language Reference; in that they can both be compared to a dictionary. A dictionary is a document which contains all of the words that can be used in a language; but, it doesn't teach you how to combine those words into a good, well structured book.
PDF is based on Postscript. The documentation for Postscript was released as a set of three volumes.
Postscript Language Reference - Red Book
PostScript Language Program Design - Green Book
I find that there is a general misunderstanding about the nature of Portable Document Format (PDF) version numbers.
Version 1.0 of the PDF file format was released by Adobe in 1993. Over the past fourteen years PDF has been updated seven times. The current version of PDF is 1.7. These changes to the PDF version number represent additions to the file format.
All of the "older" stuff in PDF works exactly the same way it did. None of the basic PDF text drawing primitives have changed. PDF 1.0 is still…
Adobe has released a technical note talking about additional XML data Acrobat 8 adds to each page of a PDF file when the file is Bates numbered using Acrobat 8.
Bates Numbering in PDF documents (PDF, 123K)
Here is what the XML looks like:
<Bates start="1" ndigits="6" prefix="ADBE" suffix="DRAFT"/>
The above XML is added to each page of the PDF file and will produce a Bates number on each page: for example; ADBE000001DRAFT.
So, instead of simply correctly numbering each and every page; applications that…
I’ve been developing an application to generate Fresnel Zone Plates; and, ran into an interesting problem. A Zone Plate is similar to a lens in its ability to focus light. It differs from a lens by using diffraction instead of refraction.
The problem I encountered is that Acrobat creates many significant drawing artifacts when it renders this PDF drawing to the screen. In the above screen capture; only the rings centered on the center of the graphic are real. All other rings, centered off of the…
Copyright 2010 by Appligent, Inc.