Loading ...
As fonts become more sophisticated, and application software uses them in more complicated ways, what causes the display of specific glyphs has been obscured. Even articles by well-known typographers can get it wrong. But knowledge is power, and understanding the process allows you to better control the result—and maybe, sometimes, worry less.

In the “old days” of the 1990s, graphic designers mostly used Type 1 (PostScript) fonts, and though things were much worse, they were simple. A Type 1 font contains a maximum of 256 encoded characters. If type designers wanted to give you alternate glyphs in Type 1, they had to use existing character slots: either by replacing some standard characters, or making separate fonts for the alternate characters.
 


An "Expert Set" font: unrelated ligatures replace the letters V-Z.
 

If the glyphs in a particular Type 1 font didn’t map cleanly to the standard characters of the ISO-Adobe character set, then the glyphs would be placed in the numbered slots or codepoints normally used by something else. For example, the additional f-ligature glyphs (V, Y and Z) in Adobe Expert Set fonts used the codepoints usually assigned to the characters X, Y and Z. This meant that for the word “office” set with an “Y” ligature, the underlying text would end up as office. Using the Expert Set font for the Y plus the Regular font for the other letters would make the word look like “office” even though the underlying text characters were quite different. Heaven help you if you wanted to spell check this, or selected the whole paragraph and switched it to a different font, or copied the text into e-mail, or searched for the word “office,” or…well, you get the idea. If you needed pre-built fractions for 1/3, special symbols or Russian, the answers were similar: Change to a separate font and use a special chart to know what keys to strike to get a character codepoint in your text—a codepoint that would mean something entirely different in a regular Western font, which would be a problem if you ever changed fonts for that text.



This FontLab screenshot shows which characters in the Latin Extended-B range of Unicode are supported by Hypatia Sans Pro Bold.
 

Things have gotten better with OpenType. Ligatures just appear as you type, and for the most part, designers don’t have to worry about it. If you want extra fractions, swash alternates or small caps, you can get all those things in one well-chosen font, accessing them via formatting. Similarly, support for Hungarian, Greek and Russian can be had in a single font and accessed by changing language keyboards or using the Glyph Panel found in Adobe’s Creative Suite (or the Quark, Mac OS or Windows equivalent). If you want to know what can still go wrong, yet how it all works anyway, read on.

At the heart of the OpenType font format lies the difference between characters and glyphs. A character is the abstract idea of a letter or symbol, which has its own unique codepoint in the Unicode text encoding standard. That Unicode codepoint may or may not be occupied in a given OpenType font, but it is reserved for that character and only that character (for example, the euro character is at Unicode U 20AC, capital “A” is U 0041 and the Cyrillic capital “Be” is U 0411). OpenType and Unicode are not synonymous, but all OpenType fonts have a Unicode encoding.

A glyph is a specific instantiation of that character in a given font, with a specific appearance. Sometimes a single character has several different representations in a font. A typeface might offer a regular cap A, a small cap A and a swash A—all of which are one character (cap “A,” U 0041), but three different glyphs. One of these would be the default glyph directly associated with that Unicode codepoint, while the others would be alternates or variants.

In a most extreme example, Poetica Std offers 57 different ampersands: one character but 57 glyphs. (In Adobe-speak, a Standard western font is anything that lacks central European language support, regardless of how many typographic goodies it has.) Contrariwise, a font might have a “Th” ligature, which is one glyph, but two characters (the “T” and the “h”), or an “off” ligature, which is again one glyph, but four characters (“o-f-f-i”).

Most of the time when you’re working within an OpenType text model, your application (such as QuarkXPress or Adobe InDesign or Illustrator) deals with characters rather than glyphs. The underlying text that gets operated on by the application (spell-checked or copied or what-have-you) is the stream of characters. When it comes to display, the application automatically does the additional processing and transformations to decide what glyphs to display for that character-based text stream, based on what layout features are “on” for that text.

Say, for example, you enter “o-f-f-i-c-e” and you have standardligatures on. The application asks its text engine to tell it whatglyphs to display for that text stream with ligatures on. One fontmight not have any ligatures, another might have just the “fi” ligature,and a third might have an “offi” ligature. How many and which glyphs yousee is entirely font-dependent. Yet the underlying text is unchanged bythe question of which glyphs come out the other end.

All of which is a very long-winded way of saying “Unicode characters  + OpenType layout = glyphs” or in design terms “text + formatting = display.” The beauty of this is that whenever you search, copy andpaste, change fonts or do other operations, the application is goingback to that underlying text. Then it reapplies the formatting todetermine what glyphs to display. So as long as everything goesaccording to plan, your underlying text remains intact.

So, what can go wrong with this idyllic picture of text processing? Well, first off, the Unicode situation is a bit more complicated than first presented here, thanks to threefeatures of Unicode: compatibility with legacy encodings, canonicalequivalence and the existence of the Private Use Area.



Inserting an encoded ligature with the Windows Character Map (top) and with the OS X Character Palette (bottom).
 

One of the underlying principles of Unicode is that it must becompatible with all the older standards for text encoding that precededit. If some preexisting encoding made a distinction between twocharacters or had a slot for something, Unicode must do so as well.This rule supersedes some other principles about things like notencoding ligatures. So, left to its own devices, Unicode would not haveencoded the five basic f-ligatures. But because they were encoded inexisting encoding standards (the “fi” and “fl” ligatures are even part ofthe basic MacRoman encoding), they have Unicode slots U FB00 throughU FB04.

Now, to try to help with this, Unicode then goes on to say that the “fi”ligature codepoint at U FB01 is canonically equivalent to the separateletters f and i. This means that in theory, a Unicode-compliantapplication should treat text exactly the same whether it encountersseparate codepoints for the letters f and i (with or without ligatureformatting on), or the single codepoint for the “fi” ligature. ButUnicode compliance is not an all-or-nothing deal, so there can beUnicode applications out there that don’t handle canonical equivalencecorrectly. Therefore, if you’re not sure about where the text might begoing, it’s better to have the most common representation of theunderlying text—in this example that would mean to have the separateletters f and i, and get the ligature via formatting, rather thanhaving a hard-coded “fi” ligature in your text.

But what do you do when the application doesn’t support this fancyOpenType formatting functionality? The answer is, it depends. If you’redoing text entry in something like MS Word, and will be doing finallayout in something else (such as Adobe InDesign, or QuarkXPress 7 orlater), then the best thing is to do the final formatting steps in thesecond application. If you need to do as much as possible in Word, youcan use character and paragraph styles, and then apply the appropriateformatting in the matching style in InDesign/QuarkXPress, so the rightthing will happen on text import.

If something like Microsoft Word is your final output software, and itdoesn’t support OpenType formatting, how can you access alternateglyphs or ligatures? In such cases, your best option is to access thedesired glyph directly by an appropriate Unicode value, which you canget to by means of the OS X Character Palette or Windows Character Mapaccessory. For glyphs that have standard Unicode codepoints, inapplications which support Unicode, this is easy enough. The “fi”ligature, if present, should be encoded at U FB00, for example. Foreignlanguage characters and math symbols all have their own special blocksin Unicode as well. You just need a font that supports the charactersyou need.



Adobe Garamond Pro in FontLab, showing the beginning of the Private Use Area. The font developers decided to encode titling alternates here, as well as swashes, unusual ligatures and other alternate glyphs.
 

But what about ligatures or alternates that don’t have standardUnicodes? Well, in some fonts, they will be accessible via the UnicodePrivate Use Area (PUA), a special section of Unicode where thecodepoints are not assigned to anything in particular. The mostcommonly seen PUA codepoints are in the range E000-F8FF. Some fontswill take all the glyphs that don’t have legitimate Unicode values andassign them pua values. So, a “Th” ligature might be at U E017. Somevendors have staked out some sections of the PUA for specific purposes,but these can overlap or vary; for example, some PUA code­points usedsemi-consistently by Adobe overlap with those used by Star Trek fansfor Klingon. It’s important to note that some fonts don’t assign PUAUnicodes to alternate glyphs at all; the newest OpenType fonts fromAdobe don’t use PUA codepoints for alternates and ligatures, andMicrosoft’s have never done so. By using an application that is notOpenType savvy, the unencoded glyphs in those fonts are practicallyinaccessible. There is no workaround short of modifying the font withdedicated font editor software—which is time-consuming, often expensiveand usually not allowed by your license for the font software anyway.

Even if it does work, using these PUA values to access special glyphsshould be a last resort. The meaning of a given PUA codepoint will varyfrom one font vendor to another, possibly from one font family toanother, and sometimes even between fonts in the same family ordifferent versions of the same font! You will be able to search for orreplace such a glyph only if you use the PUA codepoint in your searchtext. PUA values will baffle spell-check, and make a mess of plain text.However, they at least give you a way of getting special glyphsdisplayed in applications that are Unicode savvy, but don’t yet dofancy OpenType typography.

Newly armed with this knowledge of the character/glyph model, you can go out and conquer glyphs, regardless of your application! ca

Thomas Phinney is president of FontLab, the font creation/editing software company, and treasurer of ATypI. From 1997-2008 he did type at Adobe, lastly as product manager for fonts and global typography. After that he spent five years as senior technical product manager (a.k.a. “guru”) of fonts and typography at Extensis, including managing the font library for the WebINK web font solution. Phinney has long been involved in the design, technical, forensic, business, standards and history of type. His interest in forensic typography has led to testifying as an expert witness in court, being quoted in newspapers from the Washington Post to the Dallas Morning News, and being consulted by organizations ranging from PBS (for History Detectives) to the US Treasury. Phinney has an MS in printing from the Rochester Institute of Technology, and an MBA from UC Berkeley.
X

With a free Commarts account, you can enjoy 50% more free content
Create an Account
Get a subscription and have unlimited access
Subscribe
Already a subscriber or have a Commarts account?
Sign In
X

Get a subscription and have unlimited access
Subscribe
Already a subscriber?
Sign In