Adding Microsoft Jheng Hei 正黑體 Font to PDF with Node.js pdf-lib

Background

Adding text to PDF is not something trivial, especially for Chinese when font need to be added to support the display rendering.

For this article, I am working with True Type Font, and so let’s have some very brief understanding for True Type Font (or fonts in general).

There are following terminology:

  1. Font Collections — a collection of multiple fonts
  2. Font — a collection of character render in a specific (collective) style
  3. Glyphs — the path/curve used to render the character

Based on my understanding, when a character is being rendered, it would need to figure out the glyph to be used to render by a mapping table.

For languages that use limited alphabets, like English, the glyphs are limited (26 small letter and 26 capital letters plus other symbols and numbers…), so the size of font is limited, while in Chinese as a pictogram writing system, each character need to have a corresponding glyph and so the font size is considered much larger, and that lead to concept of subset.

For PDF, we can embed a subset of font only containing the characters being used in the file.

Library to be used

I am using Node.js and been using pdf-lib library to add image to PDF, and this time need to add Chinese text. The library is well documented and easy to use.

Note that the aim of this library is to add elements on the PDF, and that does not support parsing the text layer in the PDF, which is not covered in this article as well.

Basic usage

if one need to use existing file, firs read the PDF into bytes (e.g. using “fs” for local files) and then parse the document with PDFDocument.load(existingPdfBytes)

Note that the example above added the embedFont() function call with standard fonts from the pdf-lib library

The standard fonts supports following fonts, which are mainly English based:

Supporting of other fonts

To support other font, one can embed custom font as follow:

The idea is to load the font file as bytes and embed into the PDF file, then refer the font during draw text.

The Microsoft Jheng Hei (正黑體) font

For this Microsoft Jheng Hei font, this is additional steps.

Getting the font

One can go to the Fonts folder (typically at C:\Windows\Fonts) and locate the “Microsoft JhengHei” font, then drag and drop this into some other folders.

You would get 3 files as follow, I am using the msjh.ttc:

Note the extension is .ttc, instead of the .ttf in example above (and articles all over the internet)

File extension .ttc is for True Type Collection, that means it contains several True Type Fonts inside. And the library does not support loading .ttc file and you might encounter error “TypeError: this.font.layout is not a function”

The solution is to extract the font from the collection and use that instead, I used this site below (use at your own risk):

And you can successfully embed the font and use it in drawText(…) function call.

Embedding subset of font

With pdf-lib, the embedding subset of font is very easy, just add the {subset: true} to the embedFont(…)

The PDF file size would be greatly reduced (e.g. from 14MB to 300kB, as the full set font itself occupied ~14MB, and by creating subset, it would only embed the glyphs that’s being used, and that’s a big reduction for writing system like Chinese which have a lot of characters).