Mark Gavin's PDFblog
PDF Techniques for WCAG 2.0
by Mark Gavin
The W3C has released a document on PDF Techniques for WCAG 2.0. The Web Content Accessibility Guidelines 2.0 (WCAG 2.0) is a collection of recommendation for making web content accessible to users with disabilities.
PDF Viruses Part 2; Why is this a surprise?
Programming Pearls (2nd Edition)
Jon Bentley
by Mark Gavin
This post continues my review of Jiri Sejtko and Jindřich Kubec's report, on the Avast Blog, of a PDF exploit discovered by Avast.
Jiri in his post says the following: "That’s another surprise from PDF, another surprise from Adobe, of course. Who would have thought that a pure image algorithm might be used as a standard filter on any object stream you want?"
I really don't understand the "surprise". Plenty of people have thought about embedding non-image data into image formats. For example; …
PDF Viruses Part 1; This is not a glitch
by Mark Gavin
There have been a few posts recently related to embedding viruses and malware inside PDF files. Jiri Sejtko and Jindřich Kubec on the Avast Blog did a good job of describing the mechanics of a recent exploit which compressed a malicious XFA array object within a /FlateDecode /JBIG2Decode filter set. They then takes the position that PDF and Adobe are bad because this type of compression is allowed on any data and not just image data. Jiri then goes on to say "I’m not happy to see another trick based on a glitch in the PDF specification."
…
PDF Cross Reference Table
by Mark Gavin
The PDF Cross Reference Table (xref) is the third major section of a PDF File. Please refer to the PDF Basic File Layout. The xref is the index by which all of the indirect objects, in the PDF file, are located. A single PDF file can contain multiple xref tables if the file has been incrementally saved or linearized.
Typically, the PDF cross reference table will have the following form.
The cross reference table starts with the word "xref". In the above example; all of the data following the word "xref" is an "xref subsection". …
Command Line
by Mark Gavin
Duff Johnson recently wrote a fairly nice introduction to using command line applications in TalkingPDF. The article contains several interesting links including a link to a novel introduction to the engineering design behind Google Chrome.
ASFixed Numbers in XCode
by Mark Gavin
The XCode debugger will not display ASFixed numbers correctly by default. You need to add a Custom Data Formatter which contains the instructions to the XCode debugger on how to properly format an ASFixed number for display. These instructions are in the form of an XML file named CustomDataViews.plist; located in the following path:
(~/Library/Application Support/Apple/Developer Tools/CustomDataViews/CustomDataViews.plist)
Following is a screenshot of how the ASFixed number "100" is normally displayed in XCode:
…
Learning Postscript
Amazon List
Learning Postscript: A Visual Approach
Ross Smith
by Mark Gavin
The single best book written on learning Postscript is "Learning Postscript: A Visual Approach" written by Ross Smith.
ISBN 978-0938151128
This book was published in March of 1990 and has been out-of-print for a long time. But through the magic of the internet this masterpiece is easily available.
To the left is a link on Amazon.com. You can also find the book at AbeBooks.com. AbeBooks is my favorite place to find out-of-print books, used books and rare books.
Postscript programming was fairly easy for me to learn; since, I already knew how to program using the …
Fixed Point Math
by Mark Gavin
In the past year I have taught an Acrobat SDK training course a few times. During these courses; I noticed many programmers; especially younger programmers, have not been exposed to Fixed Point Mathematics. When working with the Acrobat SDK; Fixed Point Math is very important; because, many of the SDK calls expect the caller to be passing numerical information using Fixed Point data types like ASFixed, ASFixedPoint, ASFixedRect, ASFixedQuad and ASFixedMatrix.
The Acrobat SDK uses the ASFixed data type to represent …
PDF/UA is now ISO/AWI 14289
by Mark Gavin
The PDF Universal Accessibility working group has received an ISO "Approved Work Item" number from the International Standards Organization (ISO). PDF/UA is now ISO/AWI 14289.
As PDF/UA continues to more through the standards process; the eventual standard will be labeled ISO 14289.
PDF Language Module
by Mark Gavin
We have developed a BBEdit Language Module for PDF. This language module is written to aid developers and support personnel in reading raw PDF files by highlighting specific elements of the file with syntax coloring.

The meaning of the colors are as follows:
blue - PDF keywords
red - arrays
green - dictionaries
purple - strings
light gray - stream data
PDFHilight is a Universal Binary plug-in built for BBEdit 9.
Installation: To install the plug-in, make sure BBEdit isn't running, and place the plug-in in the following folder:
(your home directory)/Library/Application Support/BBEdit/Language Modules
…
PDF Object Streams
by Mark Gavin
Recently I have received several PDF documents which contain compressed object streams. Object Streams are described in section 7.5.7 of ISO-32000-1; and, are a mechanism of storing a collection of indirect Cos Objects together inside of a Cos Stream.
This Cos Stream; of Type "ObjStm" may or may not be compressed. Though, it would be pointless not to compress the stream.
Object Streams became available starting with PDF 1.5; and, therein lies the reason for this blog posting. Every one of the files with Object Streams, to cross my desk recently, has a version number of "PDF 1.4".
…
PDF Standard Available
by Mark Gavin
The unofficial ISO 32000-1 PDF Standard document is now available as a free download from Adobe. The body text of the document is the same as the official ISO 32000-1 standard; but, page headers and footers have been changed to replace the ISO copyright with the Adobe copyright.
ISO 32000-1
by Mark Gavin
Portable Document Format (PDF) is now officially an international standard. The International Organization for Standards (ISO) has published ISO 32000-1:2008, Document management – Portable document format – Part 1: PDF 1.7. This is an ISO standard based on PDF 1.7. Here is a link to the ISO press release.
Work is currently underway for the development of ISO 32000-2. To participate in the development of the next version of PDF; get involved. In the United States; the PDF Reference Committee…
Tools for Creating Acrobat Forms
by Mark Gavin
Otherwise known as AcroForms; Acrobat form technology was first introduced in PDF version 1.2; and, has been around for more then ten years. In addition to Adobe Acrobat; there are third parties which have released products to create Acrobat forms.
Following is a list of tools to create AcroForms:
The Acrobat Professional package includes tools to create documents using two different forms technology; Acroforms using the form tools under Acrobat; and XFA using Adobe Form Designer. Note: XFA is an XML based forms technology which in incompatible with AcroForms.
…
Forms Data Format
by Mark Gavin
A Forms Data Format ( FDF ) file is a text file that contains a list of form field names and their values. Acrobat Forms, or AcroForms, were introduced in PDF Version 1.2. To allow for the import and export of data from AcroForms; Adobe developed the Forms Data Format. The documentation for the Forms Data Format is located in the PDF Reference in the chapter on "Interactive Features" under the section "Interactive Forms".
There are two kinds of FDF files:
• Classic - supplies data to fill out an existing static form.
…
Presenting Data and Information
Books by Edward Tufte
The Visual Display of Quantitative Information, 2nd edition
Edward R. Tufte
by Mark Gavin
Last week while in Boston for the AIIM Conference; I used the Monday before the conference to attend a one day course taught by Edward Tufte on "Presenting Data and Information". The course focuses on effectively presenting and communicating information.
The course is given in various locations around the country throughout the year. I've known about the course for the past several years; but, until last week the scheduling didn't work out to make it convenient for me to attend.
I found the course to be well researched, thought provoking and entertaining. …
Acrobat 8 Crash FreeText Annotation
by Mark Gavin
The following simple PDF document contains a single FreeText annotation. The FreeText annotation is displayed correctly under both Acrobat 7 and 8. However, using the mouse to click on the annotation under Acrobat 8, causes Acrobat 8 to crash. FreeTextCrash.pdf
Below is a screen shot of the FreeText annotation labeled "COV".
The crash occurs in Acrobat 8 on both Windows and Macintosh.
Clicking on the annotation under Acrobat 7 selects the annotation as expected.
HP Smart Document Scan
by Mark Gavin
Recently we have received a couple of malformed PDF files produced by the HP Smart Document Scan software. It appears that the HP Smart Document Scan software is only included with the HP Scanjet 7800 Scanner and the HP Scanjet 8350 & 8380 scanners.
The version number of the PDF files produced is PDF 1.0. The first problem we found is located in /Name objects which contain '#xx' hex values. The use of # hex values were not part of PDF 1.0; # hex values in Name objects were introduced in PDF 1.2.
…
PDF Linearization
by Mark Gavin
Linearization is a variant on the PDF file layout as described previously. Linearization is also called "Fast Web View". Linearization shuffles the contents of the PDF file to place all of the information needed to display the first page near the beginning of the file.

This allows the user to see the first page while the remainder of the file is still downloading from the web.
Incremental saves on a linearized file can actually break linearization; but, Acrobat still reports the file as enabled for "Fast Web View". …
Jim King's Presentations
by Mark Gavin
Jim King, Principal Scientist at Adobe Systems has a personal web site which contains a collection of his public presentations. These presentations include PDF Tutorials, Color Management, Color Science, XML/PDF Tutorial and High Resolution Rendering. Several of the presentations are annotated with speakers notes. I would encourage everyone to check it out. The URL to the presentations is as follows: http://home.comcast.net/~jk05/presentations/
PDF Basic File Layout
by Mark Gavin
For the most part; the basic layout of a PDF file can be fairly simple. A PDF file consists of four primary sections as illustrated below:

The PDF file "Header" is just one or two lines starting with %PDF. The "Body" is a collection of objects which include the page contents, fonts, annotations, etc. The "xref Table", or cross reference table, is a collection of pointers to locate the individual objects contained in the "Body". The "Trailer" contains the pointer to the start of the cross reference table.
…
Adobe Bates Numbering?
by Mark Gavin
We received an email from one of our customers, who is an attorney, who uses Bates numbering on a regular basis. Following is one of the sentences from this customers email:
"I wouldn't have thought it possible, but Adobe has managed to implement its Bates-stamping in a manner which makes it virtually useless [or at least highly impractical for use by] attorneys, the primary users of Bates-stamp utilities."
When I saw this I decided to take a look at Acrobat Bates Numbering.
I really don't use most of the features available in Acrobat, this being no exception. …
Acrobat 8 Text Shifting
by Mark Gavin
Following is a collection of screen shots taken using a single PDF file displayed under Acrobat 4 through Acrobat 8.
Acrobat 4
Acrobat 5
Acrobat 6
Acrobat 7
Acrobat 8
Following is a PDF file which demonstrates the text shifting problem:
This particular drawing error is caused by passing a large negative character spacing in a text array when the text is of zero length.
132.96 741.6 TD -0.06048 Tc [()-4800()] TJ -0.32976 Tc (A) Tj
PDF - The Missing References
Postscript References
PostScript(R) Language Reference (3rd Edition)
Adobe Systems Inc.
PostScript(R) Language Program Design
Adobe Systems Inc.
PostScript(R) Language Tutorial and Cookbook
Adobe Systems Inc.
by Mark Gavin
The Adobe PDF Reference is similar to the Adobe Postscript Language Reference; in that they can both be compared to a dictionary. A dictionary is a document which contains all of the words that can be used in a language; but, it doesn't teach you how to combine those words into a good, well structured book.
PDF is based on Postscript. The documentation for Postscript was released as a set of three volumes.
Postscript Language Reference - Red Book
PostScript Language Program Design - Green Book
PDF Version Numbers
by Mark Gavin
I find that there is a general misunderstanding about the nature of Portable Document Format (PDF) version numbers.
Version 1.0 of the PDF file format was released by Adobe in 1993. Over the past fourteen years PDF has been updated seven times. The current version of PDF is 1.7. These changes to the PDF version number represent additions to the file format.
All of the "older" stuff in PDF works exactly the same way it did. None of the basic PDF text drawing primitives have changed. PDF 1.0 is still perfectly valid and usable; it is the basis for all PDF files in existence today; it simply can not be used to represent more advanced features or graphics found in later versions of PDF.
…
Acrobat XML Tags for Bates Numbering
by Mark Gavin
Adobe has released a technical note talking about additional XML data Acrobat 8 adds to each page of a PDF file when the file is Bates numbered using Acrobat 8.
Bates Numbering in PDF documents (PDF, 123K)
Here is what the XML looks like:
<Bates start="1" ndigits="6" prefix="ADBE" suffix="DRAFT"/>
The above XML is added to each page of the PDF file and will produce a Bates number on each page: for example; ADBE000001DRAFT.
So, instead of simply correctly numbering each and every page; applications that attempt to use this information will need to calculate the Bates number based on the above XML attributes. …
The Limits of Resolution
by Mark Gavin
I’ve been developing an application to generate Fresnel Zone Plates; and, ran into an interesting problem. A Zone Plate is similar to a lens in its ability to focus light. It differs from a lens by using diffraction instead of refraction.
The problem I encountered is that Acrobat creates many significant drawing artifacts when it renders this PDF drawing to the screen. In the above screen capture; only the rings centered on the center of the graphic are real. All other rings, centered off of the center, are drawing artifacts.
…





