OGSL Ideographic Variation

This page describes an experimental implementation of Unicode Ideographic Variation Sequences as defined in Unicode Technical Report 37 (referred to as TR37 in the following). [https://unicode.org/reports/tr37] in OGSL/GVL. The original idea of using these with Unicode cuneiform is owed to Robin LeRoy.

Overview

Unicode IVSs provide a means for selecting glyph variants in a standardized way. They work as a character pair in which the second character is a selector for a glyph variant of the first. The selectors are in the range U+E0100-U+E01EF.

The Unicode Ideographic Variation Database, IVD, as defined in TR37 is very simple: a collection must be registered with basic metadata. A complementary file defines sequences in the collection consisting of pairs of base characters and variation selectors along with a collection name and a name for the sequence.

Oracc_OGSL Collection

The name of the Oracc OGSL collection is Oracc_OGSL. This implementation is presently unregistered, but the collection data would be of the form:

	Oracc_OGSL;[A-Z]+[0-9]*(?:[@%&*.][A-Z]+[0-9]*)*_[A-Z]+[0-9]*(?:[@%&*.][A-Z]+[0-9]*)*;http://oracc.org/ogsl/ivs/
      

Oracc_OGSL Identifier Structure

Oracc_OGSL identifiers consist of a BASESIGN, underscore character, '_', and VARDATA. Both BASESIGN, the sign subject to variation, and VARDATA, the variation, consist of uppercase letters A-Z followed by optional digits 0-9, followed by optional compound sign parts. A compound sign part consists of a delimiter from the set @%&*. plus a sign name, consisting of uppercase letters A-Z followed by optional digits 0-9.

The BASESIGN is always derived from an OGSL sign name translating subscript digits to regular digits and substituting SH for Š, e.g., MU, NI2, or SHU. For compound signs the sign name is simplified by omitting vertical bars and parentheses and mapping any TIMES sign to '*', e.g., |GA₂×AN| would be specified as GA2*AN.

The VARDATA is either an OGSL sign name or a descriptive label for the variant selected via the sequence as described further in the next section.

Oracc_OGSL Reference Glyphs

A table of Oracc_OGSL reference glyphs is maintained at XXX/ogsl/ivsglyphs.html.

Oracc_OGSL use of Variation Selectors

Oracc_OGSL defines IVSs for two reasons:

The following IVS uses are initally defined:

	E0100 MERGER0
	E0101 MERGER1
	E0103 MERGER2
	E0103 MERGER3

	E0110 VFORM0
	E0111 VFORM1
	E0113 VFORM2
	E0113 VFORM3
      

Sample Oracc_OGSL IVS entries

The most common use of Oracc_OGSL IVS will be to handle mergers--a number of sign distinctions in Early Dynastic writing become lost over time, for example IM and NI₂ are separate signs in ED Fara but are merged in almost every other script phase. This can be expressed as the following IVS entry:

      
	# IM merges with NI2 except for ED Fara
	1214E E0100;Oracc_OGSL;IM_NI2
      

Systemic sign variation is also handled using IVS entries. A variation of the MU sign, mostly associated with ED Adab, is to form the SHE-style component of the sign with a KASKAL-style component instead:

	# MU sign with KASKAL-style replacement for normal SHE-style component
	1222C E0110;Oracc_OGSL;MU_KASKAL
      

In the case of variant forms, The VARDATA component of the label MU_KASKAL may qualify the variation rather than defining the target form as a merged sign.

Oracc_OGSL Ideographic Variation Database

The Oracc_OGSL Ideographic Variation Database, OIVD, is kept in 00etc/Oracc_OGSL.txt in the Oracc OGSL repo, and is available online at XXX.

The OIVD contains all possible applications of IVS in Oracc_OGSL; not all script phases will exhibit all IVS traits, however. Subsetting of the OIVD for individual script phases is handled using configuration files which give a simple list of IVS names. By convention these have the extension .oiv. A trivial example could be a file merger.oiv containing the line:

	IM_NI2
      

GVL Implementation

The Grapheme Validation Library (GVL) uses data derived from a sign list to validate Oracc transliteration; it also uses the sign list to add Unicode data to the processed transliteration. GVL can be configured to use OIVD entries via a .oiv file, which is named for a script-type. If there is a script-type GU there must also be a .oiv file GU.oiv which is looked for along a path of .oiv file locations. This path may include project configuration options, standard locations in the project, and system locations in OGSL.

A .oiv file is a three-column tabbed file giving a base character, a base+selector pair, and optional rendering information. GVL itself does not use the rendering information but this may be used by cuneify under certain circumstances if it is not supported in the font being used for cuneification.

GVL callers are responsible for determining the selection of the correct IVS sets dependent on project, text, or transliteration metadata. GVL guarantees that a requested .oiv file will be read only once so switching between script-types is efficient: this feature can be used not only to account for general variations like mergers, but also individual handwriting such as the variable differentiation of GA and BI signs on certain Old Babylonian literary tablets or graphetic features as defined in the source corpus of SAAo.

 
Back to top ^^
 
CC BY-SA The OGSL Project, 2014-
http://oracc.org/ogslideographicvariation/