A year ago YesLogic open-sourced the Allsorts font parser, shaping engine, and subsetter. In this post we cover how Allsorts has worked out in Prince, what's improved over the past year, and what we're working on next.
Allsorts is a Rust crate (library) that can parse OpenType, WOFF, and WOFF2 fonts, shape text, and subset fonts1.
At YesLogic we've been using Allsorts to power all font parsing, shaping of supported scripts, and subsetting in Prince since version 13 was released in November 2019. Aside from the odd small bug revealed by real-world usage and exposure to new fonts it has proven to be capable and reliable.
Over the past year we've continued to work on Allsorts, adding support for additional scripts, bitmap and SVG fonts, and refined the API to make it easier to use.
New Scripts
Allsorts can now shape the Arabic and Syriac scripts giving us even more coverage of the world's scripts.
Bitmap and SVG Fonts
We can now parse OpenType font tables containing bitmaps and SVGs:
CBDT
/CBLC
andEBDT
/EBLC
, as used by Noto Color Emoji.sbix
, as used by Apple Color Emoji.SVG
, as used by Twitter Color Emoji.
We implemented a unified API that supports retrieving the image for a glyph independent of the tables it originates from.
The bitmap and SVG support has allowed us to bring emoji and colour SVG font support to Prince 14:
Improved API
Prior to Allsorts 0.5 it was a fairly involved affair to shape some text. You had to read the font, do glyph mapping, apply glyph substitution (GSUB), and apply glyph positioning (GPOS) in individual steps which amounted to quite a bit of code.
Now it's only a few lines. You read the font and construct an instance of the
Font
type, turn text into glyphs with a call to map_glyphs
, then shape the
text with a call to shape
:
// Read and parse font
let buffer = std::fs::read(&opts.font)?;
let scope = ReadScope::new(&buffer);
let font_file = scope.read::<FontData<'_>>()?;
// Construct Font instance
let provider = font_file.table_provider(opts.index)?;
let mut font = match Font::new(provider)?
.expect("unable to find suitable cmap subtable");
};
// Map glyphs
let glyphs = font.map_glyphs("some text", MatchingPresentation::NotRequired);
// Shape glyphs
let script = tag::LATN;
let lang = tag::from_string("ENG ")?;
let shaped_glyphs = font.shape(
glyphs,
script,
Some(lang),
&Features::Mask(GsubFeatureMask::default()),
true,
)?;
We also renamed a number of types and methods to make their function more obvious and accurate.
Miscellaneous Improvements and Fixes
Throughout the year we made several other improvements and fixes, including:
-
Extended our glyph substitution support, implementing GSUB Lookup Type 8 (Reverse Chaining Contextual Single Substitution), which gains us support for Tristan Hume's Numderline font:
-
Fixed a few bugs and handled some quirks revealed by exposing the code to a variety of real world fonts.
-
Improved performance through:
- GSUB caching in Arabic, Syriac, and Indic.
- Restructuring to avoid unnecessary bounds checks and allocations.
- Using tinyvec to store codepoints on glyphs.
-
Added support for more OpenType features in GSUB:
- Standard ligatures (
liga
). - Discretionary ligatures (
dlig
). - Historical ligatures (
hlig
). - Contextual ligatures (
clig
). - Small caps (
smcp
). - Small capitals from capitals (
c2sc
). - Lining figures (
lnum
). - Oldstyle figures (
onum
). - Proportional figures (
pnum
). - Tabular figures (
tnum
). - Diagonal fractions (
frac
). - Stacked fractions (
afrc
). - Ordinals (
ordn
). - Slashed zero (
zero
). - Language-specific OpenType shaping (
locl
).
- Standard ligatures (
-
Support fonts with Big5 encoded cmap subtables.
Future Plans
We're currently implementing the following features to make Allsorts better suited for use cases outside our own:
- Expose glyph positioning information in shaping output so you don't have to calculate it yourself.
- Retrieve glyph contours as a series of basic drawing operations.
After those we hope to look into:
- Text segmentation by script to allow a chunk of text to be supplied to Allsorts without having to detect and segment it by script first.
- Glyph caching.
Other items on our radar are:
- Performance measurement and optimisation to make it competitive with other shaping libraries.
- Support additional scripts such as Sinhala, and Indic 3/Universal Shaping Engine.
- Unicode normalisation.
- Being able to map shaping output back to the source text.
Links
Acknowledgements
Thanks to Adrian, Alfie, Michael, Paul, and Peter for helping me write this post.
Font shaping is the process of taking text in the form of Unicode codepoints and a font, and laying out glyphs according to the text. This involves honouring kerning, ligatures, substitutions, and reordering specified by the font.
Font subsetting refers to decreasing the size of a font by only including the data for a reduced set of glyphs.