PDFLib

Volume Number: 15 (1999)
Issue Number: 12
Column Tag: Programming Techniques

Lasso 3.5

by Kas Thomas

A great freeware library makes adding PDF support to an app easy

Adobe's Portable Document Format (PDF) has become a de facto standard for electronic document interchange, based on its ability to deliver graphically rich, structured content in a consistent manner across multiple operating environments. Almost every large web site offers at least some PDF-based content, making the Acrobat Reader one of the most popular downloads on the web. (Incredibly, Adobe claims to average some 100,000 downloads of the Reader from its web site per day.)

Because of its support for vector graphics, font embedding, hypertext links, and other advanced features, PDF is a powerful, far-reaching document standard. But that also means it's a relatively complex standard (for details, see the September 1999 MacTech) - and therefore far from trivial to support in an application.

From a programming standpoint, one can talk about two types of PDF support: support for PDF reading (import), and support for PDF writing (export). As with TIFF, QuickTime, and many other complex formats, it's much easier to provide write support than read support, because a comprehensive PDF-read capability means implementing the entire rather ponderous PDF specification (see http://partners.adobe.com/asn/developer/PDFS/TN/PDFSPEC.PDF), whereas a write-only facility may mean implementing only a tiny subset of the PDF spec - the subset of particular interest to your application. For example, if your application primarily outputs ASCII text, there is no need to implement graphics-embedding, halftoning, transfer functions, etc., in order to support PDF output.

Adding a well-defined PDF-output capability to an application can be surprisingly quick and easy, if you make full use of existing tools. For this article, I decided to add PDF export capability to BBEdit (the popular text editor), with the aid of a third-party freeware PDF library called PDFLib. Source code for the BBEdit plug-in accompanies this article. (The complete CW Pro 5 project, including PDFLib and its source files, can be found online at ftp://www.mactech.com.) But before we start talking code, let's take a moment to review the basics of the PDF format, then look at what kinds of development paths one might take to arrive at a PDF-export capability, and what sorts of tools are currently available to make the programmer's life easier.

PDF Fundamentals

Adobe's Portable Document Format is a kind of gigantic, special-purpose markup language, based largely on Postscript (the postfix-notation page description language) but lacking Postscript's control-flow constructs. PDF is a sort of "unrolled" version of Postscript, in which all graphics operations are inline (rather than relying on loops) and therefore speedy. Lookups and indexing operations are likewise fast because of PDF's extensive use of associative arrays (or "dictionaries," in Adobe parlance), organized into treelike structures in which all nodes have forward and/or back-pointers to other nodes; plus, every leaf (of every kind) has an entry in a giant 'xref' table, so that the offset of any object can be looked up instantly.

Pages are organized into sets of objects that describe a page's resources and content. The objects are human-readable ASCII and look like:

4 0 obj
<</Type /Page
/Parent 1 0 R
/Resources 8 0 R
/MediaBox [0 0 612 792]
/Contents [5 0 R ]
>>
endobj

In this case, the top line tells us we're dealing with Object No. 4, revision zero. The object is a dictionary object, as indicated by the double angle brackets, << and >>, enclosing the object. The first entry in the dictionary is a label telling the type of dictionary (in this case, a Page). The next label/value pair is a backpointer to the parent of this object, namely Object No. 1. (A reference ending in 'R', such as 1 0 R, is a pointer to an object.) The next entry tells where the page's resources can be found (namely, in Object No. 8.) The MediaBox entry gives the page's dimensions, in points (72 points to the inch); here, 612 by 792 means that we're dealing with a standard U.S. Letter-size page (8.5 by 11 inches). The final entry, in the above example, shows where the page's Contents (probably a stream object) can be found, namely in Object No. 5.

When text needs to be displayed on a page, it is packaged inside a stream object. The stream object will contain the actual ASCII or Unicode strings that need to be displayed, along with various Postscript-like operators, such as m for "moveto" and TL for "set leading," that control the stroking, filling, and positioning of the individual letters or glyphs.

When all of the objects in a PDF file have been written, a cross-reference ('xref') table must be inserted into the file. The entries in the 'xref' table must conform to a fixed format (see my article in MacTech for September 1999) and they must contain the exact byte offset from the start of the file to the object referenced by the entry in question. The integrity of the PDF document rests on the accuracy of the byte offsets stored in the 'xref' table. Since most of these offsets aren't known until the objects are written, the 'xref' table usually goes at the end of the file. (This isn't always the case, however. So-called "linearized" or "optimized" PDF files have an 'xref' table at the beginning of the file.)

Several things should be obvious by now. First, there is nothing freeform about a PDF file. Unlike HTML, a PDF file is highly structured, with many pointers between objects. Byte offsets matter a great deal and must be accounted for when the file is written. Secondly, PDF files are largely self-contained, bringing with them their own font resources and embedded graphics (rather than linking external resources). Thirdly, to write a PDF file means lots of string manipulations - something that, frankly, ANSI C is a little weak at (compared to, say, Perl). Beyond that, the PDF specification itself (currently contained in a 518-page, 160,000-word document) can be difficult to read and interpret. Supporting PDF export in an application written in C can be a bit tedious, to say the least.

Third-Party Libraries

It helps, in a situation like this, to be able to call on help from third parties, rather than reinvent the wheel yourself. Fortunately, some excellent tools are available to make your life easier. Among the general-purpose libraries are available for adding PDF handling capabilities to applications are:

Adobe's PDF Library, also known as PDFL40; for use with Code Warrior on the Mac, Visual C++ 5.0 on Windows platforms, and gcc 2.8 on Sparc Solaris.
The CLibPDF Library, by FastIO Systems (http://www.fastio.com); an ANSI C library, compilable on just about any platform.
PDFLib, by Thomas Merz (http://www.pdflib.com); a C library, with bindings for C++, Java, Perl, Python, Tcl, and Visual BASIC.

If you're a Perl user, you'll want to check out PDF-on-the-Fly, a Perl library available from the University of Nottingham (http://www.ep.cs.nott.ac.uk/pdf-pl/download/manual.pdf), as well as txt2pdf, a library from Sanface Software (sanface@sanface.com).

Adobe's PDFL40 is without question the most powerful and robust library available, relying as it does on the Acrobat 4.0 codebase. With PDFL40, you can read, display, and write PDF from your own application. But unfortunately, PDFL40 isn't free - and even if you can afford the licensing fees, you may not be allowed to use the library. As stated in Adobe's literature, PDFL40 is selectively licensed to developers who are creating "products that are strategic to Adobe's marketing plans." In other words, Adobe will review your development plans carefully, and if they like what you're doing and if you agree to play by Adobe's rules, you may be allowed to pay to use the library.

Outside of Adobe, the two best-known C libraries for PDF support are CLibPDF, by FastIO Systems, and Thomas Merz's PDFLib. Both come with full source code and can be used without restriction (or virtually without restriction) by individual developers who are creating freeware or personal-use software. (Corporate users and commercial developers must take out a license, at significant cost.) The main restriction of these libraries is that they support PDF output only. They will not help you read PDF or display a PDF document on the screen. The same is true for the two Perl libraries: PDF-on-the-Fly and txt2pdf are basically write-only. If you need to put PDF up on the screen, you'll probably want to look into an open-source program called Ghostscript (http://www.cs.wisc.edu/~ghost/index.html), which started as a freeware PostScript interpreter, written in 1988 by L. Peter Deutsch, founder of Aladdin Systems. Starting with version 3.3, Ghostscript has been able to read and display PDF files in addition to PostScript documents. With version 4.0, Ghostscript added Postscript-to-PDF conversion (i.e., Distiller functionality). Because the code is generic C, Ghostscript has been successfully ported to most platforms, including Win32, OS/2, MacOS, Unix, Amiga, VAX, etc. (An excellent PDF-based manual for Ghostscript is available from Thomas Merz; see http://www.muc.de/~tm.)

CLibPDF and PDFLib are similar in their capabilities. Their differences are summed up in Table 1. Both are extremely easy to set up and use. Of the two, CLibPDF is the more advanced package in terms of the number of features and overall performance. CLibPDF has roughly 170 library routines to PDFLib's 88. Many of CLibPDF's routines provide advanced graphics capabilities involving setting up Cartesian coordinate axes (linear or logarithmic) and plotting data (including data stored in external files). CLibPDF was designed to make it easy for people who need to generate 2D plots to create attractive graphs on-the-fly in PDF, without passing the data through an intermediary application such as Matlab. In this, it excels.

	PDFLib	ClibPDF
Full source code available?	Yes	Yes
Documentation	64 pp.	75 pp.
API calls, total	88	170
Image formats supported	Gif,Tiff.JPEG,CCITT	JPEG
Font metrics formats	AFM/PFA	PFM/PFB
Thread safe?	Yes	Yes
Bindings for scripting languages?	Yes	No
Font embedding?	Yes	Yes
Font subsetting?	No	No
Compression?	No	Flate only
Text-justification option?	No	Yes
Vector graphics functions?	Yes	Yes
Custom graph plotting?	No	Yes
Annotations?	Yes	Yes
Bookmarks?	Yes	Yes
Hypertext links?	Yes	Yes
Form widgets?	No	No
Reenter pages after writing?	No	Yes

Table 1. Comparison of PDFLib and CLibPDF.

CLibPDF is also the clear winner in terms of benchmark scores. In a test (conducted by a corporate user) involving the construction of an intricate 156-page document filled with engineering information, CLibPDF produced a 257,027-byte PDF file in just 15 seconds. By comparison, Adobe's Distiller took over three minutes to produce a 197,548-byte file; Adobe's PDF Library took 54 seconds to create a 284,365-byte file; and PDFLib took 84 seconds to yield a 1,314,084-byte finished document. The filesize disparity is due to the fact that PDFLib uses no text compression, whereas the others do. (Adobe uses a combination of LZW and Flate compression. To avoid patent infringement issues, CLibPDF uses only Flate compression.)

Wherefore PDFLib?

Why would anybody use PDFLib? For one thing, it's the only library that comes with ready-made bindings for Perl, Python, Tcl, Java, and (on Win32) Visual BASIC. This is incredibly important if you're a web developer who needs to be able to serve dynamic PDF - PDF pages generated automatically, on the fly - for web clients. Dynamic PDF pages (via Perl, say) are easily possible using PDFLib. All you have to do is link the PDFLib shared library with the PerlStub file (which is part of the MacPerl distribution suite) and follow the calling conventions given in Thomas Merz's excellent documentation (which has example code listings for all the different bindings).

But what about the big file sizes? you ask. It's true that, as of yet, PDFLib does not have any compression support - for text. For imagery, PDFLib supports JPEG, GIF, TIFF and CCITT bitmaps, all of which are compressed. (Acrobat Reader handles the decompression automatically.) CLibPDF, on the other hand, only handles JPEG embedding, unless you pay the license fee ($1,000), in which case you can get TIFF support (among other features).

PDFLib's lack of text compression can result in big files if you're mainly outputting big gobs of text. But if you will be serving dynamic PDF web pages (or creating other fairly small text files), you won't suffer for not having compression, since small text streams often compress poorly - or even grow, rather than shrink - at pack-down time.

It turns out PDFLib is ideal for generating small to medium-sized text-based PDF documents, because - unlike Adobe's own products - PDFLib won't automatically embed fonts or font subsets for any of the standard 14 core Type 1 fonts that are included with Acrobat Reader (the Helvetica, Times, and Courier families, plus Zapf Dingbats and Symbol). This can be important, because although a small PDF file may or may not shrink significantly with compression turned on, it will definitely grow when fonts are embedded unnecessarily.

Another reason to use PDFLib is that it's nominally smaller and easier to learn than CLibPDF (although the latter is by no means hard to work with). And should you later need to port your code to a scripting language, you can reuse your code with very little work.

Adding PDF Export to BBEdit

BBEdit (by Bare Bones Software) is one of the most popular ASCII editors on the Mac. Features like regex-based (regular expression) search-and-replace, robust HTML tools, and neck-snapping performance have endeared BBEdit to thousands of loyal users. But when it comes to producing eyepleasing output, BBEdit isn't exactly a killer app. Wouldn't it be nice to be able to save BBEdit documents as PDF files now and then? PDF is easier to look at (and print out) than raw ASCII, any day.

It's not hard to add PDF export to BBEdit, because like so many software products these days, BBEdit supports a plug-in API that allows third-party programmers access to the main program's data. The BBEdit plug-in API is well documented and has hooks to many utility functions for retrieving the text from documents, manipulating user selections, etc. Space doesn't permit a full tutorial on writing BBEdit plug-ins here. However, we will have space to run through the 200 or so lines of C required for a short plug-in that lets the user save an open BBEdit document as a PDF file.

The Code

The BBEdit plug-in interface requires that we compile an old-fashioned Code Resource of type 'BBXT' and creator 'R*ch'. (The creator type can be anything you want, but if you stay with 'R*ch', your plug-in will have the icon associated with BBEdit extensions.) Note that the name of your 'BBXT' resource (not the filename of your plug-in) is the name that will appear in the BBEdit "Tools" menu at runtime.

The main() routine for our PDF-Output plug-in, shown in Listing 1, is typical of most BBEdit extensions. It shows that our resource is called with a pointer to a BBEdit structure called the ExternalCallbackBlock; a WindowPtr associated with the frontmost user window; a long int containing various flag values to convey information about the state in which BBEdit is in; and pointers to AppleEvents. All we do in main() is call EnterCodeResource(), check our flags (and the WindowPtr, for validity), then call bbxtGetWindowContents() - which retrieves a Handle to all the text in the frontmost (active) document - before handing the text off to our filtering routine. When we're done, we call ExitCodeResource() and that's all she wrote. Easy as pi.

Listing 1: main( )

main( )
pascal OSErr main(ExternalCallbackBlock *callbacks, 
			WindowPtr w, 
			long flags, 
			AppleEvent *event, AppleEvent *reply)
{
	OSErr	err = noErr;

	EnterCodeResource();

	{
		Handle text;
		WindowPtr newWindow;

		if (!w || (xfWindowOpen & flags == 0) 
			return err;

		text = bbxtGetWindowContents(callbacks,w);

		err = pdfTranslate(callbacks,text,w); // write pdf

	} 

	ExitCodeResource();

	return err;
}

We don't actually do anything with the AppleEvent pointers in this example. In a real plug-in, these pointers would be the mechanism by which your plug-in could be controlled through OS-level scripts. Most of the time, though, these pointers will be nil. In all versions of BBEdit Lite, for example, the pointers are always nil.

The real heavy lifting occurs in Listing 2, where our PDFLib routines get called. Before using any other PDFLib routines, we call PDF_new() to initialize the library. (This results in a number of large data structures being allocated and filled out for us, behind the scenes. The principal data structure is something called, appropriately, a PDF. A pointer to this data structure must be passed to every library routine so that PDFLib can keep track of the PDF document's state.) At the end of the routine, before exiting, we call PDF_close() to close the connection to the PDFLib library, freeing all resources that were allocated earlier.

Listing 2: pdfTranslate( )

pdfTranslate( )

OSErr pdfTranslate( ExternalCallbackBlock *callbacks, 
			Handle theText,
			WindowPtr w ) {


	PDF *p = nil;
	int font,j;
	long i,linecount,textLength;
	OSErr err = noErr;
	Boolean timeForNewPage = false; // sentinel
	char *input,
			filename[32],
			buf[TAB_VALUE *CHARS_WIDE];
	unsigned char *out = buf;
	char okLineEnders[] = "- ;:>";

	p = PDF_new();  
	if (p == nil) return -1;

	HLockHi(theText);

	FudgeName(callbacks,filename,w); // create outfile name

	// open the new PDF file 
	if (PDF_open_file(p,(char *)filename )==-1){
		fprintf(stderr,"Error:cannot open temp.pdf file.\n");
		exit(2);
	}

  // these lines are optional:
	PDF_set_info(p,"Creator","BBEdit PDF Exporter plug-in");
	PDF_set_info(p,"Author","Kas Thomas");
	PDF_set_info(p,"Title","Hello world!");

	PDF_begin_page(p,letter_width,letter_height); // start a page

	// find a base-14 font
	font = PDF_findfont(p,"Times-Roman","default",0);
		if (font ==-1){
		fprintf(stderr,"Couldn't set font!\n");
		HUnlock(theText);
		exit(3);
	}

	PDF_setfont(p,font,FONTSIZE); // set font & size
	PDF_set_leading(p, LEADING);  // set line spacing 

	PDF_set_text_pos(p,TEXT_STARTX,TEXT_STARTY);

	PDF_show(p," ");

	input = *(unsigned char **)theText;

	textLength = GetHandleSize(theText); // how long is our text?

   // for every character...
	for (i = 0,linecount = 1; i < textLength - 1 ; )
	{
	  // fetch the current line...
	 	for (j = 0, out = buf; 
				 j < CHARS_WIDE - 1 && i < textLength - 1;
				 j++) 
	 		{	 		
	 			*out++ = input[i++];

				if (input[i-1] == TAB) { // we must handle Tabs ourselves
					int k;

					for (k = 0; k < TAB_VALUE; k++)
				  	*out++ = SPACE;
				  }

				if (input[i-1] == CARRIAGE_RETURN)   // break on CR
					goto TerminateLine;
			}

		// get to next word ending
		while (strchr(okLineEnders,input[i])==NULL)
			*out++ = input[i++];

		TerminateLine:

		*out = 0x00; // make it a C string

		PDF_continue_text(p,buf); // write to PDF file

		if (linecount++ % LINES_PER_PAGE == 0) { // end of page?
			PDF_end_page(p);
			timeForNewPage = true;
			}

		if (timeForNewPage && i < textLength - 1) { // more to do
			timeForNewPage = false;
			PDF_begin_page(p,letter_width,letter_height); // new page
			PDF_setfont(p,font,FONTSIZE);
			PDF_set_leading(p, LEADING);
			PDF_set_text_pos(p,TEXT_STARTX,TEXT_STARTY);
			PDF_show(p," ");
			}

	}   // for i

	PDF_end_page(p);	// close page
	PDF_close(p);		// close PDF obj

	HUnlock(theText);

	return err;
}

The PDFLib routine PDF_open_file() will create a new file for us (in the current directory) if we pass it a pointer to a PDF struct along with a pointer to a filename string. Note that the filename string must be a C string. We create the necessary string (consisting of the original file's name, plus the extension ".pdf") in a custom utility routine, FudgeName(). See Listing 3.

After creating our (empty) output file, we make three calls to PDF_set_info(), to set the file's Creator, Author, and Title. These strings will show up when the user does a Get Info on the PDF document while viewing it in Acrobat Reader. It is not strictly necessary to call PDF_set_info(), since PDF files are not required to have "Get Info" info; but PDFLib makes creating these tags easy. (Again, though, note the use of C strings rather than Pascal strings.)

Listing 3: FudgeName()

FudgeName( )
// Get the current BBEdit file's name, add ".pdf" to it, put it in 'str' as a C string.

void FudgeName(ExternalCallbackBlock *cb, 
					unsigned char *str, WindowPtr w) 
{
 	Str255 fName;
	short v;
	long d;
	long length;
	char ending[] = { '.','p','d','f', 0x00 };

	bbxtGetDocInfo(cb,w,fName,&v,&d); 
	length = *fName;	// Pascal string

   // now we create a C string:
	BlockMove(fName+1,str, length);
	BlockMove(ending,str+length,5);	
}

To begin a PDF page, we call - what else? - PDF_begin_page() with, in this case, the predefined values letter_width and letter_height, which correspond to the dimensions of a standard U.S. letter-sized page. (PDFLib also has predefined constants for A4, legal, and many other page sizes. Or you can use your own custom dimensions.)

Next, we come to one of the most important calls in this or any routine that uses the PDFLib library. Namely, we do:

 font = PDF_findfont(p,"Times-Roman","default",0);

The purpose of this call is to locate font resources for our document and specify an encoding for the font. (Here, we're guaranteed to get a valid return value, since Times-Roman is one of the base-14 fonts that Acrobat Reader can always use.) Allowable encoding values are built-in, pdfdoc, macroman, macexpert, winansi or default (see Section 3.4.2 of Thomas Merz's excellent PDFLib manual). In our case, we're content to let PDFLib determine the most suitable encoding based on the environment, so we indicate this by setting the third argument to "default." (The encoding must be specified as a C string.)

The return value from PDF_findfont(), if not equal to -1 (an error), will be needed in subsequent calls involving typesetting parameters, such as PDF_setfont() and PDF_set_leading(). It's important to understand that the value returned by PDF_findfont() is not an enumerated value or an index into a fixed lookup table. Rather, it's an index into the font cache of one particular PDF document. If you're working with two documents, one may store Times-Roman in its cache at a different index than the other; hence, PDF_findfont() may return two different values for the same font, based on the font's use in two different files. Don't just assume that if PDF_findfont() returns '1' for Helvetica, that therefore Helvetica will always be referenced by a font value of '1'. It may only be '1' for one file, in one particular context.

Having gotten a valid return value from PDF_findfont(), we use that value in a call to PDF_setfont(), which attaches the font resource to the PDF file and also lets us specify the point size of the font. The point size can be any floating-point value: 24.0 for a small headline, say, or 10.0 to 12.0 for regular body copy, etc. (Fractional values like 13.4 are fine, too.) We can similarly set the line spacing with PDF_set_leading(). Typically, the leading is close in value to the point size of the text. If you specify the leading as 1.2 times the point size, you won't go far wrong. (For double-spaced text, try 3.0 or 4.0 times the point size.)

The library function PDF_set_text_pos() lets us position our "pen" or insertion point at any x-y position on the page. Here, you have to remember that in the PDF coordinate space, (0,0) corresponds to the lower left corner of the page, with 'y' increasing in the up direction. Also, recall that in the PDF world, the default unit of space is the typesetter's point, which is 1/72-inch. Thus, if you want to begin writing at a distance of one inch from the left edge of the page and ten inches up from the bottom, you would specify coordinates of (72, 720).

To write text on a PDF page, you can either make repeated calls to PDF_set_text_pos() and PDF_show(), specifying new line-start coordinates every time, or else make one call to PDF_show(), followed by repeated calls to PDF_continue_text(). The latter function automatically repositions the insertion point to the start of a new line, using the left-margin and leading parameters that you've already specified. This can be more convenient than keeping track of line depths yourself. To keep our main loop from having to be a do-while loop, we make a dummy call to PDF_show() with a value of " " before entering the loop. Then, inside the loop, we just make repeated calls to PDF_continue_text().

The Main Loop

Our main loop, which is actually a double nested loop, deserves comment. The outer loop counts individual characters and makes sure that we loop over all the characters in the source document, stopping only when we've gotten to the end of the file. The inner loop fetches one line of text at a time, writing to a line buffer, 'buf', which is conservatively sized at a fixed size of TAB_VALUE * CHARS_WIDE. In a real application, you'd determine CHARS_WIDE dynamically, based (perhaps) on the point size of the text or some other metric. For this short demo, we've hard-coded the type size at 9.0 points and the line width, CHARS_WIDE, at 80 via #defines. The reason our line buffer has to be sized at TAB_VALUE * CHARS_WIDE is that it's conceivable that we could encounter a pathological line of "text" where every character is a Tab. If a Tab is equal to five spaces, our line buffer had better be 400 bytes in capacity rather than just 80, or else we'll overwrite the buffer.

Inside the inner loop, as we gather characters into a "line" of text, we have to handle Tabs ourselves, converting ASCII 0x09 (the Tab character - which is a non-printing ASCII value) to spaces. We also check for end-of-line characters ourselves. In true Mac-centric manner, we ignore linefeeds and consider every newline to be equal to ASCII 0x0D (carriage return). Of course, text files created on a Unix machine won't conform to this assumption, since in the Unix world newlines tend to be ASCII 0x0A (linefeed). In the DOS and Windows worlds, lines end with both a linefeed and a carriage return: 0x0D0A.

Our inner loop is constructed in such a way that when the number of characters read equals CHARS_WIDE, we bail out and write the line to the PDF file, but in addition, we bail out any time a hard return (carriage return) is encountered. This lets us handle both traditional Mac text files (in which lines are soft-wrapped to the screen, with carriage returns coming only once per paragraph) as well as DOS-style documents in which every single line (not just the paragraph) ends with a hard return.

The fact that there are two ways to fall out of the inner loop has interesting consequences. Obviously, if we encounter a hard return, there's no question about what to do: we immediately write the line out to the file. But if we fall out of the main loop because our line has begun to exceed CHARS_WIDE characters, it's possible (likely, in fact) that we've bailed out in the middle of a word! Hence, we have to insert some contingency code to read to the end of the current word. The code that does this looks like:

		// get to next word ending
		while (strchr(okLineEnders,input[i])==NULL)
			*out++ = input[i++];

The standard C function strchr() checks to see if the second argument (a character) occurs anywhere in the first argument (a string). It returns NULL on a miss and non-NULL on a match.

If we fall out of the loop because of a hard return, we don't need the above code. Therefore we can skip around it with (ugh) a goto. There are probably better ways (stylistically) to handle this situation, but in the interest of clarity, I decided to keep the goto, for now.

Once we're out of the loop, we have to remember to make our line a C string (i.e., we must null-terminate it); then we can call PDF_continue_text(p,buf) to write the line. All that remains is to check the number of lines written, to see if it's time for a new page, and if so, start a new page. Here, it's important to note that every call to PDF_begin_page() results in PDFLib resetting its graphic state, which means we need to specify our type size, leading, and cursor-position values all over again. If you forget to do this, you'll be wondering where all the text went on the second and subsequent pages of your PDF document.

When we're done, we call PDF_close(), unlock our text handle, and return to the calling routine. Using PDF_close() actually not only frees up our library-invoked resources but also closes any working files we've left open. So at this point, we can consider our work done, and control can return to the host process, in this case BBEdit.

Enhancements

In a real-world BBEdit extension, it would be a good idea not only to get serious about error-checking but also consider such things as a user preferences dialog and support for Apple Events (which should include a mechanism for suppressing dialogs, so that scripted operations aren't hung up in midstream by unattended dialogs). Also, the main loop should be wrapped with the BBEdit API's bbxtStartProgress() and bbxtDoneProgress() calls, and the inner loop should contain one call to bbxtDoProgress() for every line of text processed, so that the user knows how things are progressing. BBEdit will display a progress thermometer automatically, suppressing it for short-duration events, if you use these calls.

BBEdit's plug-in API also has some handy convenience routines for dealing with Apple Events. For example, consider what you can do with the following three lines:

	bbxtFindApplication(cb,'CARO', &appFSS);
	err = bbxtLaunchApplication(cb,'CARO',&appFSS,&psn);
	bbxtSendOpenDoc(cb,'CARO', nil, &fss,true);

With the arguments shown, the first call has the effect of searching the BBEdit default disk for the application whose signature is 'CARO' - namely, Acrobat Reader. The second function launches that application, and the third function sends it an 'odoc' event, instructing the app to open the document specified by the FSSpec pointed to by &fss. In other words, with three lines of code you can make BBEdit launch Acrobat Reader and display your just-created PDF file in a Reader window. To accomplish this with our own custom-written code (properly error-checked) would require at least 200 lines of additional code, doubling the size of our plug-in!

In terms of the PDF-writing portions of the code, there are many possible further enhancements. For example, it would be nice to let the user specify page margins, text size, leading, etc. by means of a setup dialog. Also, you could try justifying the user's text. PDFLib offers functions for controlling character spacing, word spacing, and character widths, with accuracy of a thousandth of an em. (An em is a typesetter's unit, roughly equivalent to the point size of the type.) You can use the PDFLib routine PDF_stringwidth() to find out how wide a given text string is. As an exercise, you might try developing a justification routine that preferentially adjusts word spacing, followed by character spacing, followed by character width, each with its own weighting factor. (For some interesting algorithms here, seek out Don Lancaster's excellent article on "Picojustification" at http://www.tinaja.com/glib/picojust.pdf.)

A PDF-outputting BBEdit plug-in that incorporates some of these features (and others, such as rudimentary HTML tag interpretation) can be found at http://www.acroforms.com.

Conclusion

Thomas Merz's PDFLib library offers an excellent way to get started in PDF programming, combining ease of use with cross-platform and even cross-language portability. It's the only PDF library that can easily be adapted for use with Perl, Python, Tcl, Visual BASIC, and Java, as well as C/C++. It comes with outstanding documentation, plenty of sample code (for all language bindings), and the price - for non-commercial users - can't be beat, since it's free.

Look at it this way: Now you don't have any excuse for not putting PDF support in your applications!

Kas Thomas is a frequent contributor to MacTech and author of a forthcoming O'Reilly book on PDF-based web programming. You can reach him at kt@acroforms.com.

Software Updates via MacUpdate

Latest Forum Discussions

Challenge those pesky wyverns to a dance...

After recently having you do battle against your foes by wildly flailing Hello Kitty and friends at them, GungHo Online has whipped out another surprising collaboration for Puzzle & Dragons. It is now time to beat your opponents by cha-cha... | Read more »

Pack a magnifying glass and practice you...

Somehow it has already been a year since Torchlight: Infinite launched, and XD Games is celebrating by blending in what sounds like a truly fantastic new update. Fans of Cthulhu rejoice, as Whispering Mist brings some horror elements, and tests... | Read more »

Summon your guild and prepare for war in...

Netmarble is making some pretty big moves with their latest update for Seven Knights Idle Adventure, with a bunch of interesting additions. Two new heroes enter the battle, there are events and bosses abound, and perhaps most interesting, a huge... | Read more »

Make the passage of time your plaything...

While some of us are still waiting for a chance to get our hands on Ash Prime - yes, don’t remind me I could currently buy him this month I’m barely hanging on - Digital Extremes has announced its next anticipated Prime Form for Warframe. Starting... | Read more »

If you can find it and fit through the d...

The holy trinity of amazing company names have come together, to release their equally amazing and adorable mobile game, Hamster Inn. Published by HyperBeard Games, and co-developed by Mum Not Proud and Little Sasquatch Studios, it's time to... | Read more »

Amikin Survival opens for pre-orders on...

Join me on the wonderful trip down the inspiration rabbit hole; much as Palworld seemingly “borrowed” many aspects from the hit Pokemon franchise, it is time for the heavily armed animal survival to also spawn some illegitimate children as Helio... | Read more »

PUBG Mobile teams up with global phenome...

Since launching in 2019, SpyxFamily has exploded to damn near catastrophic popularity, so it was only a matter of time before a mobile game snapped up a collaboration. Enter PUBG Mobile. Until May 12th, players will be able to collect a host of... | Read more »

Embark into the frozen tundra of certain...

Chucklefish, developers of hit action-adventure sandbox game Starbound and owner of one of the cutest logos in gaming, has released their roguelike deck-builder Wildfrost. Created alongside developers Gaziter and Deadpan Games, Wildfrost will... | Read more »

MoreFun Studios has announced Season 4,...

Tension has escalated in the ever-volatile world of Arena Breakout, as your old pal Randall Fisher and bosses Fred and Perrero continue to lob insults and explosives at each other, bringing us to a new phase of warfare. Season 4, Into The Fog of... | Read more »

Top Mobile Game Discounts

Every day, we pick out a curated list of the best mobile discounts on the App Store and post them here. This list won't be comprehensive, but it every game on it is recommended. Feel free to check out the coverage we did on them in the links below... | Read more »

Price Scanner via MacPrices.net

Free iPhone 15 plus Unlimited service for $60...

Boost Infinite, part of MVNO Boost Mobile using AT&T and T-Mobile’s networks, is offering a free 128GB iPhone 15 for $60 per month including their Unlimited service plan (30GB of premium data).... Read more

$300 off any new iPhone with service at Red P...

Red Pocket Mobile has new Apple iPhones on sale for $300 off MSRP when you switch and open up a new line of service. Red Pocket Mobile is a nationwide MVNO using all the major wireless carrier... Read more

Clearance 13-inch M1 MacBook Airs available a...

Apple has clearance 13″ M1 MacBook Airs, Certified Refurbished, available for $759 for 8-Core CPU/7-Core GPU/256GB models and $929 for 8-Core CPU/8-Core GPU/512GB models. Apple’s one-year warranty is... Read more

Updated Apple MacBook Price Trackers

Our Apple award-winning MacBook Price Trackers are continually updated with the latest information on prices, bundles, and availability for 16″ and 14″ MacBook Pros along with 13″ and 15″ MacBook... Read more

Every model of Apple’s 13-inch M3 MacBook Air...

Best Buy has Apple 13″ MacBook Airs with M3 CPUs in stock and on sale today for $100 off MSRP. Prices start at $999. Their prices are the lowest currently available for new 13″ M3 MacBook Airs among... Read more

Sunday Sale: Apple iPad Magic Keyboards for 1...

Walmart has Apple Magic Keyboards for 12.9″ iPad Pros, in Black, on sale for $150 off MSRP on their online store. Sale price for online orders only, in-store price may vary. Order online and choose... Read more

Apple Watch Ultra 2 now available at Apple fo...

Apple has, for the first time, begun offering Certified Refurbished Apple Watch Ultra 2 models in their online store for $679, or $120 off MSRP. Each Watch includes Apple’s standard one-year warranty... Read more

AT&T has the iPhone 14 on sale for only $...

AT&T has the 128GB Apple iPhone 14 available for only $5.99 per month for new and existing customers when you activate unlimited service and use AT&T’s 36 month installment plan. The fine... Read more

Amazon is offering a $100 discount on every M...

Amazon is offering a $100 instant discount on each configuration of Apple’s new 13″ M3 MacBook Air, in Midnight, this weekend. These are the lowest prices currently available for new 13″ M3 MacBook... Read more

You can save $300-$480 on a 14-inch M3 Pro/Ma...

Apple has 14″ M3 Pro and M3 Max MacBook Pros in stock today and available, Certified Refurbished, starting at $1699 and ranging up to $480 off MSRP. Each model features a new outer case, shipping is... Read more

Jobs Board

*Apple* Systems Administrator - JAMF - Activ...

…**Public Trust/Other Required:** None **Job Family:** Systems Administration **Skills:** Apple Platforms,Computer Servers,Jamf Pro **Experience:** 3 + years of Read more

IT Systems Engineer ( *Apple* Platforms) - S...

IT Systems Engineer ( Apple Platforms) at SpaceX Hawthorne, CA SpaceX was founded under the belief that a future where humanity is out exploring the stars is Read more

Nurse Anesthetist - *Apple* Hill Surgery Ce...

Nurse Anesthetist - Apple Hill Surgery Center Location: WellSpan Medical Group, York, PA Schedule: Full Time Sign-On Bonus Eligible Remote/Hybrid Regular Apply Now Read more

Housekeeper, *Apple* Valley Village - Cassi...

Apple Valley Village Health Care Center, a senior care campus, is hiring a Part-Time Housekeeper to join our team! We will train you for this position! In this role, Read more

Sublease Associate Optometrist- *Apple* Val...

Sublease Associate Optometrist- Apple Valley, CA- Target Optical Date: Apr 20, 2024 Brand: Target Optical Location: Apple Valley, CA, US, 92307 **Requisition Read more

SPREAD THE WORD:
Slashdot
Digg
Del.icio.us
Reddit
Newsvine