|
|
Introduction
|
|
|
===============================================================
|
|
|
|
|
|
This document will as the C++ port matures serve as a log to how
|
|
|
different parts of the library work. As of today, there is some general
|
|
|
info but mostly CMap specific details.
|
|
|
|
|
|
------------------------------------------------------------------------
|
|
|
|
|
|
Font Data Tables
|
|
|
===========================================================================
|
|
|
|
|
|
One of the important goals in `sfntly` is thread safety which is why
|
|
|
tables can only be created with their nested `Builder` class and are
|
|
|
immutable after creation.
|
|
|
|
|
|
`CMapTable`
|
|
|
--------------------------------------------------------
|
|
|
|
|
|
*CMap* = character map; it converts *code points* in a *code page* to
|
|
|
*glyph IDs*.
|
|
|
|
|
|
The CMapTable is a table of CMaps (CMaps are also tables; one for every
|
|
|
encoding supported by the font). Representing an encoding-dependent
|
|
|
character map is in one of 14 formats, out of which formats 0 and 4 are
|
|
|
the most used; sfntly/C++ will initially only support formats 0, 2, 4
|
|
|
and 12.
|
|
|
|
|
|
### `CMapFormat0` Byte encoding table
|
|
|
|
|
|
Format 0 is a basic table where a character’s glyph ID is looked up in a
|
|
|
glyphIdArray256. As it only supports 256 characters it can only encode
|
|
|
ASCII and ISO 8859-x (alphabet-based languages).
|
|
|
|
|
|
### `CMapFormat2` High-byte mapping through table
|
|
|
|
|
|
Chinese, Japanese and Korean (CJK) need special 2 byte encodings for
|
|
|
each code point like Shift-JIS.
|
|
|
|
|
|
### `CMapFormat4` Segment mapping to delta values
|
|
|
|
|
|
This is the preferred format for Unicode Basic Multilingual Plane (BMP)
|
|
|
encodings according to the Microsoft spec. Format 4 defines segments
|
|
|
(contiguous ranges of characters; variable length). Finding a
|
|
|
character’s glyph id first means finding the segment it is part of using
|
|
|
a binary search (the segments are sorted). A segment has a
|
|
|
**`startCode`**, an **`endCode`** (the minimum and maximum code points
|
|
|
in the segment), an **`idDelta`** (delta for all code points in the
|
|
|
segment) and an **`idRangeOffset`** (offset into glyphIdArray or 0).
|
|
|
|
|
|
`idDelta` and `idRangeOffset` seem to be the same thing, offsets. In
|
|
|
fact, `idRangeOffset` uses the glyph array to get the index by relying
|
|
|
on the fact that the array is immediately after the `idRangeOffset`
|
|
|
table in the font file. So, the segment’s offset is `idRangeOffset[i]`
|
|
|
but since the `idRangeOffset` table contains words and not bytes, the
|
|
|
value is divided by 2.
|
|
|
|
|
|
``` {.prettyprint}
|
|
|
glyphIndex = *(&idRangeOffset[i] + idRangeOffset[i] / 2 + (c - startCode[i]))
|
|
|
```
|
|
|
|
|
|
`idDelta[i]` is another kind of segment offset used when
|
|
|
`idRangeOffset[i] = 0`, in which case it is added directly to the
|
|
|
character code.
|
|
|
|
|
|
``` {.prettyprint}
|
|
|
glyphIndex = idDelta[i] + c
|
|
|
```
|
|
|
|
|
|
### Class Hierarchy
|
|
|
|
|
|
`CMapTable` is the main class and the container for all other CMap
|
|
|
related classes.
|
|
|
|
|
|
#### Utility classes
|
|
|
|
|
|
- `CMapTable::CMapId` describes a pair of IDs, platform ID and
|
|
|
encoding ID that form the CMaps ID. The ID a CMap has is usually a
|
|
|
good indicator as to what kind of format the CMap uses (Unicode
|
|
|
CMaps are usually either format 4 or format 12).
|
|
|
- `CMapTable::CMapIdComparator`
|
|
|
- `CMapTable::CMapIterator` iteration through the CMapTable is
|
|
|
supported through a Java-style iterator.
|
|
|
- `CMapTable::CMapFilter` Java-style filter; CMapIterator supports
|
|
|
filtering CMaps. By default, it accepts everything CMap.
|
|
|
- `CMapTable::CMapIdFilter` extends CMapFilter; only accepts one type
|
|
|
of CMap. Used in conjunction with CMapIterator, this is how the CMap
|
|
|
getters are implemented.
|
|
|
- **`CMapTable::Builder`** is the only way to create a CMapTable.
|
|
|
|
|
|
#### CMaps
|
|
|
|
|
|
- **`CMapTable::CMap`** is the abstract base class that all
|
|
|
`CMapFormat*` derive. It defines basic functions and the abstract
|
|
|
`CMapTable::CMap::CharacterIterator` class to iterate through the
|
|
|
characters in the map. The basic implementation just loops through
|
|
|
every character between a start and an end. This is overridden so
|
|
|
that format specific iteration is performed.
|
|
|
- `CMapFormat0` (mostly done?)
|
|
|
- `CMapFormat2` (needs builders)
|
|
|
- ... coming soon
|
|
|
|
|
|
`[todo: will add images soon; need to upload to svn]`
|
|
|
|
|
|
------------------------------------------------------------------------
|
|
|
|
|
|
# Table Building Pipeline
|
|
|
|
|
|
Building a data table in sfntly is done by the
|
|
|
`FontDataTable::Builder::build` method which defines the general
|
|
|
pipeline and leaves the details to each implementing subclass
|
|
|
(`CMapTable::Builder` for example). Note: **`sub*`** methods are table
|
|
|
specific
|
|
|
|
|
|
**`ReadableFontDataPtr data = internalReadData()`**
|
|
|
> There are 2 private fields in the `FontDataTable::Builder` class:
|
|
|
> `rData` and `wData` for `ReadableFontData` and `WritableFontData`.
|
|
|
> This function returns `rData` if there is any or `wData` (it is cast
|
|
|
> to readable font data) if `rData` is null. *They hold the same data!*
|
|
|
|
|
|
**`if (model_changed_)`**
|
|
|
> A font is essentially a binary blob when loaded inside a `FontData`
|
|
|
> object. A *model* is the Java/C++ collection of objects that represent
|
|
|
> the same data in a manipulable format. If you ask for the model (even
|
|
|
> if you dont write to it), it will count as changed and the underlying
|
|
|
> raw data will get updated.
|
|
|
|
|
|
**`if (!subReadyToSerialize())`**
|
|
|
**`return NULL`**
|
|
|
`else`
|
|
|
1. **`size = subDataToSerialize()`**
|
|
|
2. **`WritableDataPtr new_data = container_->getNewData(size)`**
|
|
|
3. **`subSerialize(new_data)`**
|
|
|
4. **`data = new_data`**
|
|
|
|
|
|
**`FontDataTablePtr table = subBuildTable(data)`**
|
|
|
> The table is actually built, where `subBuildTable` is overridden by
|
|
|
> every class of table but a table header is always added.
|
|
|
|
|
|
Subtable Builders
|
|
|
------------------------------------------------------------------------------
|
|
|
|
|
|
Subtables are lazily built
|
|
|
|
|
|
When creating the object view of the font and dealing with lots of
|
|
|
tables, it would be wasteful to create builders for every subtable there
|
|
|
is since most users only do fairly high level manipulation of the font.
|
|
|
Instead, **only the tables at font level are fully built**.
|
|
|
|
|
|
All other subtables have builders that contain valid FontData but the
|
|
|
object view is not created by default. For the `CMapTable`, this means
|
|
|
that if you don’t go through the `getCMapBuilders()` method, the CMap
|
|
|
builders are not initialized. So, the builder map would seem to be empty
|
|
|
when calling its `size()` method but there are CMaps in the font when
|
|
|
calling `numCMaps(internalReadFont())`.
|
|
|
|
|
|
------------------------------------------------------------------------
|
|
|
|
|
|
Character encoders
|
|
|
---------------------------------------------------------------------------------
|
|
|
|
|
|
Sfntly/Java uses a native ICU-based API for encoding characters.
|
|
|
Sfntly/C++ uses ICU directly. In unit tests we assume text is encoded in
|
|
|
UTF16. Public APIs will use ICU classes like `UnicodeString`.
|