Computer cuneiform: conventions for reading and writing on screen

It took computer designers a long time to realise that users might want to write in scripts other than the modern western alphabet. For many years, cuneiformists had to design their own transliteration fonts, which replaced characters they didn't need (such as å and †) with ones they did (such as ā and ṭ). While they often worked very well for individuals or small groups, enabling a lot of really useful work, they also had big disadvantages. It was difficult to share text files with other users without also sharing fonts; the fonts were often expensive or difficult to get; and new operating systems gradually made old fonts unusable. Fortunately, these days we can all transliterate, normalise and even write in cuneiform script using international standards that all modern computers can read, with freely available fonts and without specialist software.

ASCII transliteration and normalisation

ASCII (the American Standard Code for Information Interchange) is the technical term for what most of us know as plain text. ASCII has been an international standard since the 1960s. It formally consists of the basic 95 alphabetic, numerical, and punctuation characters found on almost all western alphabetic keyboards plus a further 33 non-printing characters. Because ASCII is so well established (it is almost prehistoric in computer terms) and almost universally recognised, it is a very sound basis for a transliteration system.

There have been several ASCII transliteration schemes in use over the years, but the best documented, and most widely used, is ATF (which stands for ASCII Transliteration Format). It was designed by Steve Tinney for the Cuneiform Digital Library Initiative [http://cdli.ucla.edu] (CDLI) and Oracc [http://oracc.museum.upenn.edu]. CDLI publishes its enormous online corpus (mostly of Sumerian texts) in ATF.

The main features of ATF are as follows:

the consonants ṣ, š, ṭ and Ṣ, Š, Ṭ are written as s, (s comma), sz, t, (t comma) and S, (S comma), SZ, and T, (T comma);
subscript numerals are written as full-sized numerals;
superscript determinatives are written full-sized in curly brackets { };
sequences of missing signs are written inside square brackets [ ] and every damaged sign is written with # after it.

Compare the following Unicode and ATF transliterations of the final part of Hammurabi's Law 108:

ATF: {f}KURUN.NA szu-a-ti u2-ka-an-nu-szi-ma a-na me#-e i-na#-[ad-du]-u2-szi

Unicode: ^fKURUN.NA šu-a-ti u₂-ka-an-nu-ši-ma a-na ⸢me⸣-e i-⸢na⸣-[ad-du]-u₂-ši

It also very easy to write normalisation in ATF:

long vowels with macrons are written with = before them;
long vowels with circumflexes are written with ^ before them.

Looking again at the end of Law 108, compare:

ATF: s=ab=itam szu=ati ukann=uszima ana m^e inadd^uszi

Unicode: sābītam šuāti ukannūšima ana mê inaddûši

Although ATF isn't particularly pretty, it has several advantages: it is quick and easy to type, on any western alphabetic keyboard; it is readable on almost any computer; and it is easily convertible to alphabetic Unicode or Unicode Cuneiform.

ATF also handles all the more complex features of cuneiform. There is full documentation of ATF transliteration conventions on the ATF Inline Tutorial [http://oracc.museum.upenn.edu/ns/gdl/1.0/gdltut.html] page of the Oracc website.

Unicode transliteration and normalisation

As ASCII comprised just 95 printable characters, in the 1980s it became clear that international computer users needed a much, much bigger unified character set to cope with all the world's scripts together. Unicode [http://www.unicode.org/standard/WhatIsUnicode.html] was developed (and is still being developed) to meet the needs of scripts as diverse as Korean and ancient Greek. It assigns each letter or character of each script to a separate, consistent code point, as defined by an international standard. Now, no matter which computer you are using, wherever you are in the world, your script will always look the way you intended. All the major living scripts are defined in the Unicode standard now, and most of the ancient ones. This includes cuneiform and all the alphabetic characters needed for transliteration and normalisation.

Each Unicode code point is assigned a number, prefixed by U+, and a name in ASCII encoding. The letter š, for instance, has the Unicode number U+0161 and the name LATIN SMALL LETTER S WITH CARON. There is full documentation of the Unicode characters used for transliteration and normalisation at Oracc's Unicode Characters for Cuneiform Transliteration [http://oracc.museum.upenn.edu/doc/help/visitingoracc/unicode/index.html] page.

Almost every computer manufactured in the past few years uses Unicode by default. Much software and many websites use a form of Unicode character encoding called UTF-8 which is backwards-compatible with ASCII. This website is written in UTF-8, as are all the Oracc [http://oracc.museum.upenn.edu] online corpora. If your web browser's character encoding preferences aren't already set to Unicode (UTF-8), set them now. (Go to the Help with fonts page for instructions). This is unlikely to affect your ability to view other web pages.

Computer keyboards were originally designed with ASCII encoding in mind, and have hardly adapted to the Unicode era. So typing Unicode can be rather fiddly. Both Windows and Macs provide ways to enter Unicode characters, either using the keyboard (Windows RichEdit, Mac Unicode Hex Input) or selecting from a screen with the mouse (Windows Character Map, Mac Character Palette). But it is easier to use a custom keyboard layout, optimised for typing transliteration and normalisation. Steve Tinney's transliteration keyboards, for both Mac and Windows, are available from the Oracc Keyboards Download Page [http://oracc.museum.upenn.edu/doc/help/visitingoracc/keyboards/index.html], including full installation instructions. There is a complete list of key strokes on the Oracc Unicode Characters for Cuneiform Transliteration [http://oracc.museum.upenn.edu/doc/help/visitingoracc/unicode/index.html] page.

Because Unicode caters for over 100,000 characters, most fonts only encode a subset of them: the Western alphabetic characters, say, or Arabic and Persian. Your computer is almost certainly installed with several fonts that contain all the letters you will need for reading and writing alphabetic transliteration. But you may find that your fonts do not include Unicode subscript numerals (₀, ₁, ₂, etc.) and half-brackets (⸢ ⸣). If you cannot see the characters in parentheses in this last sentence, you will need to install a Unicode font that contains them. You will find a selection on the Oracc fonts [http://oracc.museum.upenn.edu/doc/help/visitingoracc/fonts/index.html] page.

Unicode Cuneiform

Finally, to read or write in cuneiform signs on screen, you will need one or more Unicode Cuneiform fonts. We describe how to get them on the Help with fonts page.

By far the simplest method of generating cuneiform text is to type ATF or Unicode transliteration into Cuneify and to copy and paste the resulting cuneiform text into your document.

There are other ways—but if typing Unicode transliteration is fiddly without custom keyboard layouts, entering Unicode cuneiform is even more difficult. Some of the standard methods (e.g., Mac Unicode Hex Input) are restricted to the first 65,500 or so (in fact 2¹⁶) code points of Unicode, which does not include the Cuneiform character set (in the next 65,500). Others (e.g., Mac Character Palette) are extremely unwieldy to use with Cuneiform fonts.

This unwieldiness is partly to do with the design of the software—Character Palette cannot cope with very long or very tall characters, for instance—and partly to do with the way the Unicode Cuneiform character set is organised. Cuneiformists usually organise signs according to the orientation and number of wedges they contain, starting with 𒀸 (DIŠ). Unicode, by contrast, numbers the signs in alphabetical order of their ASCII name, starting with 𒀀 U+12000 CUNEIFORM SIGN A. It uses Sumerian sign names, rather than common Akkadian values, and breaks compound signs into their constituent parts wherever possible.

For instance, in Unicode 𒄠, the sign most cuneiformists call AM, has the number and name U+12120 CUNEIFORM SIGN GUD TIMES KUR and comes shortly after 𒄞 U+1211E CUNEIFORM SIGN GUD in the sequence. Other signs have to be put together from two Unicode code points, so 𒊩𒆳 GEME₂ is actually composed of U+122A9 CUNEIFORM SIGN SAL plus U+121B3 CUNEIFORM SIGN KUR.

There is a complete list of the Unicode Cuneiform code points on the Unicode Consortium's Code charts [http://www.unicode.org/charts/] under the heading Ancient Scripts, subheading Cuneiform (in PDF format). We have also produced concordances of the Unicode Cuneiform character set with the sign lists in Labat's Manuel d'epigraphie akkadienne (1988) and Borger's Mesopotamisches Zeichenlexikon (2003), which we hope will make Unicode cuneiform fonts much easier to use.

Content last modified on 10 Jan 2017.

Eleanor Robson

Eleanor Robson, 'Computer cuneiform: conventions for reading and writing on screen', Knowledge and Power, Higher Education Academy, 2017 [http://oracc.museum.upenn.edu/saao/knpp/cuneiformrevealed/aboutcuneiform/computercuneiform/]