This page describes an experimental implementation of Unicode Ideographic Variation Sequences as defined in Unicode Technical Report 37 (referred to as TR37 in the following). [https://unicode.org/reports/tr37] in OGSL/GVL. The original idea of using these with Unicode cuneiform is owed to Robin LeRoy.
Unicode IVSs provide a means for selecting glyph variants in a standardized way. They work as a character pair in which the second character is a selector for a glyph variant of the first. The selectors are in the range U+E0100-U+E01EF.
The Unicode Ideographic Variation Database, IVD, as defined in TR37 is very simple: a collection must be registered with basic metadata. A complementary file defines sequences in the collection consisting of pairs of base characters and variation selectors along with a collection name and a name for the sequence.
The name of the Oracc OGSL collection is
Oracc_OGSL
. This implementation is presently
unregistered, but the collection data would be of the form:
Oracc_OGSL;[A-Z]+[0-9]*(?:[@%&*.][A-Z]+[0-9]*)*_[A-Z]+[0-9]*(?:[@%&*.][A-Z]+[0-9]*)*;http://oracc.org/ogsl/ivs/
Oracc_OGSL identifiers consist of a BASESIGN, underscore
character, '_', and VARDATA. Both BASESIGN, the sign subject to
variation, and VARDATA, the variation, consist of uppercase
letters A-Z followed by optional digits 0-9, followed by
optional compound sign parts. A compound sign part consists of
a delimiter from the set @%&*.
plus a sign
name, consisting of uppercase letters A-Z followed by optional
digits 0-9.
The BASESIGN is always derived from an OGSL sign name
translating subscript digits to regular digits and substituting
SH for Š, e.g., MU
, NI2
, or
SHU
. For compound signs the sign name is
simplified by omitting vertical bars and parentheses and mapping
any TIMES sign to '*', e.g., |GA₂×AN|
would be
specified as GA2*AN
.
The VARDATA is either an OGSL sign name or a descriptive label for the variant selected via the sequence as described further in the next section.
A table of Oracc_OGSL reference glyphs is maintained at XXX/ogsl/ivsglyphs.html.
Oracc_OGSL defines IVSs for two reasons:
The following IVS uses are initally defined:
E0100 MERGER0 E0101 MERGER1 E0103 MERGER2 E0103 MERGER3 E0110 VFORM0 E0111 VFORM1 E0113 VFORM2 E0113 VFORM3
The most common use of Oracc_OGSL IVS will be to handle mergers--a number of sign distinctions in Early Dynastic writing become lost over time, for example IM and NI₂ are separate signs in ED Fara but are merged in almost every other script phase. This can be expressed as the following IVS entry:
# IM merges with NI2 except for ED Fara 1214E E0100;Oracc_OGSL;IM_NI2
Systemic sign variation is also handled using IVS entries. A variation of the MU sign, mostly associated with ED Adab, is to form the SHE-style component of the sign with a KASKAL-style component instead:
# MU sign with KASKAL-style replacement for normal SHE-style component 1222C E0110;Oracc_OGSL;MU_KASKAL
In the case of variant forms, The VARDATA component of the
label MU_KASKAL
may qualify the variation rather
than defining the target form as a merged sign.
The Oracc_OGSL Ideographic Variation Database, OIVD, is kept in 00etc/Oracc_OGSL.txt in the Oracc OGSL repo, and is available online at XXX.
The OIVD contains all possible applications of IVS in
Oracc_OGSL; not all script phases will exhibit all IVS traits,
however. Subsetting of the OIVD for individual script phases is
handled using configuration files which give a simple list of
IVS names. By convention these have the extension
.oiv
. A trivial example could be a file
merger.oiv
containing the line:
IM_NI2
The Grapheme Validation Library (GVL) uses data derived from
a sign list to validate Oracc transliteration; it also uses the
sign list to add Unicode data to the processed transliteration.
GVL can be configured to use OIVD entries via a
.oiv
file, which is named for a script-type. If
there is a script-type GU
there must also be a
.oiv
file GU.oiv
which is looked for
along a path of .oiv
file locations. This path may
include project configuration options, standard locations in the
project, and system locations in OGSL.
A .oiv
file is a three-column tabbed file giving
a base character, a base+selector pair, and optional rendering
information. GVL itself does not use the rendering information
but this may be used by cuneify under certain circumstances if
it is not supported in the font being used for
cuneification.
GVL callers are responsible for determining the selection of
the correct IVS sets dependent on project, text, or
transliteration metadata. GVL guarantees that a requested
.oiv
file will be read only once so switching
between script-types is efficient: this feature can be used not
only to account for general variations like mergers, but also
individual handwriting such as the variable differentiation of
GA and BI signs on certain Old Babylonian literary tablets or
graphetic features as defined in the source corpus of SAAo.