Introduction

The Electronic Text Corpus of Sumerian Royal Inscriptions (ETCSRI) project's main objective is the creation of an annotated, grammatically and morphologically analyzed, transliterated, trilingual (Sumerian-English-Hungarian), parallel corpus of all Sumerian royal inscriptions.

Sumerian is a now extinct language that was spoken in the southern part of ancient Mesopotamia, present-day Iraq. It is an isolate without known cognate languages. Its first written sources that can be analyzed linguistically date from around the middle of the third millennium BCE. The grammar of the Sumerian language has been the subject of intensive inquiry since the first half of the 20th century. There have been, however, two main obstacles to the research on Sumerian grammar. First, scholars tried to describe Sumerian grammar using the grammatical categories of the linguistic tradition based on the Greek and Latin languages. Second, the linguistic data needed for the research was not available in an easily accessible form; the scholars had to rely mostly on their own personal collections of Sumerian texts whose size and reliability depended on the interest and status of the scholar. Most of these personal collections were useful only for the collector as they had the form of card-collections with idiosyncratic conventions, and the data on the cards could be processed only manually.

The theoretical framework used for describing Sumerian has changed thoroughly since the 1980's. A number of articles and grammatical descriptions have been published or become available whose authors were informed on the results of modern descriptive linguistics (see, e.g. Attinger 1993, the papers of Black — Zólyomi 2000, Black — Zólyomi 2007, Coghill — Deutscher 2002, Jagersma 2010, Michalowski 1980 and 2004, Woods 2008, Zólyomi 1996, 2005, 2007b, and 2014). These authors discarded the straightjacket of traditional linguistics and described Sumerian with reference to linguistic analysis carried out on non-European languages. The consequence has been a much better understanding of Sumerian grammar and that the results of grammatical research on Sumerian can now be related to the results of modern descriptive linguistics based on cross-linguistic research.

There has also been great progress in the availability of linguistic data. With the appearance of personal computers and the word-wide web, new opportunities opened up for grammatical research. The first electronic text corpora of Sumerian were simply the replications of the card-collections in a different form. The data from the cards (i.e. the Sumerian transliterated texts) were inputted into electronic files with the advantage of the possibility of fast search on the files. The difficulty with this sort of text corpus lies in the nature of the writing system used for recording the Sumerian language. This is a mixed logographic-phonographic writing system with the consequence that the same sequence of graphemes may represent a number of different word forms. The writing is often defective; the last consonant of closed syllables is as a rule unwritten except for the last period of reliable Sumerian texts in the first part of the second millennium BCE. The nature of the Sumerian writing system therefore necessitates an interpretation of the sequence of graphemes, simply transliterating these graphemes is insufficient, and must be accompanied by linguistic annotations. Without linguistic annotation even electronic text corpora are only of limited use for grammatical research.

The need for an annotated corpus of Sumerian was first recognized by the late Jeremy Black, lecturer at the Oriental Institute of the University of Oxford. In 1997 Black set up a project with the title Electronic Text Corpus of Sumerian Literature [http://etcsl.orinst.ox.ac.uk/] (Black et al. 1998–2006). During the work on ETCSL, it was often felt that it would be beneficial if the corpus of literary texts could be complemented with the corpus of royal inscriptions, the kind of texts that are most similar in terms of register and vocabulary to the literary texts. One of the main objectives of ETCSRI is to create this corpus.

The corpus of Sumerian monumental inscriptions commissioned by Mesopotamian kings, i.e. the corpus of royal inscriptions, consists of approximately 25.000 lines that correspond to roughly 50.000 words. The earliest texts come from the 25th century BCE, while the latest texts to be included in the corpus come from the end of the Old Babylonian Period (= 16th c. BCE). This historically and linguistically important group of Sumerian texts therefore spans almost one thousand years, making it an ideal object of diachronic linguistic studies.

The idea of ETCSRI stems thus from ETCSL, ETCSRI, however, plans to enhance the linguistic annotation of the texts substantially in comparison to the linguistic annotation applied by ETCSL. It will add morphological analysis to every word form accompanied by morphemic glossing, which attempts to follow the guidelines of Lehmann 2004. Verbal morphology is one the most controversial parts of Sumerian grammar. Finite verbal forms in Sumerian are distinguished by the large number of affixes attached to a verbal stem, and Sumerologists disagree both on the morphological analysis of verbal forms and the functions assigned to verbal prefixes. From this perspective the grammatical and morphological annotation of the royal inscriptions is not a routine task, but a serious challenge. The morphological and grammatical analysis of ETCSRI follows the analysis of Zólyomi 2016. This approach describes Sumerian using the model of so-called template morphology (see, e.g., Stump 1998), which arranges the morphemes into structural slots, and is eminently suitable for describing agglutinative languages such as Sumerian.

A systematic grammatical and morphological analysis of the evidence must inevitably lead to a description of the Sumerian verb that is linguistically more consistent than the existing ones as it is based on a strict distributional analysis of its constituting morphemes. At the same time, a corpus annotated at the level of morphemes is a most powerful research tool. A statistical, corpus-based approach to language research is fundamental when there are no longer any native speakers, and still more so when that language is an isolate without known cognate languages.

ETCSRI is developed at the Department of Assyriology and Hebrew Studies (Institute of Ancient Studies, Eötvös L. University, Budapest) [http://assziriologia.hu/site/] by a research team led by Gábor Zólyomi as part of The Open Richly Annotated Cuneiform Corpus [http://oracc.museum.upenn.edu/index.html] with the continuous assistance and help of Steve Tinney.

Funding for ETCSRI was provided by the Hungarian Scientific Research Fund (OTKA) between 2008.10.01—2013.03.30 (project no. K75104) and 2020.10.01—2023.09.30 (project no. NKFI-135325)

If you cite the corpus, please use the next citation form:
Zólyomi, Gábor - Tanos, Bálint - Sövegjártó, Szilvia. The Electronic Text Corpus of Sumerian Royal Inscriptions. 2008-. <http://oracc.museum.upenn.edu/etcsri/index.html>

The date of last modification: 10 Sep 2020