A year ago YesLogic open-sourced the Allsorts font parser, shaping engine, and subsetter. In this post we cover how Allsorts has worked out in Prince, what's improved over the past year, and what we're working on next.
At YesLogic we've been using Allsorts to power all font parsing, shaping of supported scripts, and subsetting in Prince since version 13 was released in November 2019. Aside from the odd small bug revealed by real-world usage and exposure to new fonts it has proven to be capable and reliable.
Over the past year we've continued to work on Allsorts, adding support for additional scripts, bitmap and SVG fonts, and refined the API to make it easier to use.
Allsorts can now shape the Arabic and Syriac scripts giving us even more coverage of the world's scripts.
Bitmap and SVG Fonts
We can now parse OpenType font tables containing bitmaps and SVGs:
EBLC, as used by Noto Color Emoji.
sbix, as used by Apple Color Emoji.
SVG, as used by Twitter Color Emoji.
We implemented a unified API that supports retrieving the image for a glyph independent of the tables it originates from.
The bitmap and SVG support has allowed us to bring emoji and colour SVG font support to Prince 14:
Prior to Allsorts 0.5 it was a fairly involved affair to shape some text. You had to read the font, do glyph mapping, apply glyph substitution (GSUB), and apply glyph positioning (GPOS) in individual steps which amounted to quite a bit of code.
Now it's only a few lines. You read the font and construct an instance of the
Font type, turn text into glyphs with a call to
map_glyphs, then shape the
text with a call to
// Read and parse font let buffer = std::fs::read(&opts.font)?; let scope = ReadScope::new(&buffer); let font_file = scope.read::<FontData<'_>>()?; // Construct Font instance let provider = font_file.table_provider(opts.index)?; let mut font = match Font::new(provider)? .expect("unable to find suitable cmap subtable"); }; // Map glyphs let glyphs = font.map_glyphs("some text", MatchingPresentation::NotRequired); // Shape glyphs let script = tag::LATN; let lang = tag::from_string("ENG ")?; let shaped_glyphs = font.shape( glyphs, script, Some(lang), &Features::Mask(GsubFeatureMask::default()), true, )?;
We also renamed a number of types and methods to make their function more obvious and accurate.
Miscellaneous Improvements and Fixes
Throughout the year we made several other improvements and fixes, including:
Fixed a few bugs and handled some quirks revealed by exposing the code to a variety of real world fonts.
Improved performance through:
- GSUB caching in Arabic, Syriac, and Indic.
- Restructuring to avoid unnecessary bounds checks and allocations.
- Using tinyvec to store codepoints on glyphs.
Added support for more OpenType features in GSUB:
- Standard ligatures (
- Discretionary ligatures (
- Historical ligatures (
- Contextual ligatures (
- Small caps (
- Small capitals from capitals (
- Lining figures (
- Oldstyle figures (
- Proportional figures (
- Tabular figures (
- Diagonal fractions (
- Stacked fractions (
- Ordinals (
- Slashed zero (
- Language-specific OpenType shaping (
- Standard ligatures (
Support fonts with Big5 encoded cmap subtables.
We're currently implementing the following features to make Allsorts better suited for use cases outside our own:
- Expose glyph positioning information in shaping output so you don't have to calculate it yourself.
- Retrieve glyph contours as a series of basic drawing operations.
After those we hope to look into:
- Text segmentation by script to allow a chunk of text to be supplied to Allsorts without having to detect and segment it by script first.
- Glyph caching.
Other items on our radar are:
- Performance measurement and optimisation to make it competitive with other shaping libraries.
- Support additional scripts such as Sinhala, and Indic 3/Universal Shaping Engine.
- Unicode normalisation.
- Being able to map shaping output back to the source text.
Thanks to Adrian, Alfie, Michael, Paul, and Peter for helping me write this post.
Font shaping is the process of taking text in the form of Unicode codepoints and a font, and laying out glyphs according to the text. This involves honouring kerning, ligatures, substitutions, and reordering specified by the font.
Font subsetting refers to decreasing the size of a font by only including the data for a reduced set of glyphs.