Explanatory combinatorial dictionary
An explanatory combinatorial dictionary (ECD) is a type of monolingual dictionary designed to be part of a meaning-text linguistic model of a natural language.[1] It is intended to be a complete record of the lexicon of a given language. As such, it identifies and describes, in separate entries, each of the language's lexemes (roughly speaking, each word or set of inflected forms based on a single stem) and phrasemes (roughly speaking, idioms and other multi-word fixed expressions). Among other things, each entry contains (1) a definition that incorporates a lexeme's semantic actants (for example, the definiendum of give takes the form X gives Y to Z, where its three actants are expressed — the giver X, the thing given Y, and the person given to, Z) (2) complete information on lexical co-occurence (e.g. the entry for attack tells you that one of its collocations is launch an attack, the entry for party provides throw a party, and the entry for lecture provides deliver a lecture — enabling the user to avoid making an error like *deliver a party); (3) an extensive set of examples. The ECD is a production dictionary — that is, it aims to provide all the information needed for a foreign learner or automaton to produce perfectly formed utterances of the language. Since the lexemes and phrasemes of a natural language number in the hundreds of thousands, a complete ECD, in paper form, would occupy the space of a large encyclopaedia. Such a work has yet to be achieved; while ECDs of Russian and French have been published, each describes less than one percent of the vocabulary of the respective languages.
The ECD was proposed in the late 1960s by Aleksandr Žolkovskij and Igor Mel'čuk[2][3][4] and was later further developed by Jurij Apresjan.[5][6][7][8][9][10][11][12][13] Three ECDs are currently available in print, one for Russian,[14] and two for French.[15][16] A dictionary of Spanish collocations—DICE (= Diccionario de colocaciones del español)—is under development.[17][18]
Characteristics of an ECD
An ECD presents lexicographic data from the point of view of synthesis—that is, organized in such a way as to allow the user, beginning with the Semantic Representation of a desired meaning, to construct a grammatical and idiomatic utterance. Entries in the ECD are based on the semantic definition of a Lexical Unit, and all the collocations of the LU are listed in its entry as well. The ECD treats idioms in the same way as lexemes—as LUs of the language.
Related homophonous LUs that share non-trivial semantic components are grouped into vocables, so that this set reflects polysemy. The English vocable IMPROVE, for example, includes six LUs, each of which is provided a separate lexical entry:
IMPROVE, verb
- IMPROVEI.1a X improves ≡ ‘The value or the quality of X becomes higher’
- [The weather suddenly improved; The system will improve over time]
- IMPROVEI.1b X improves Y ≡ ‘X causes1 that Y improvesI.1a’
- [The most recent changes drastically improved the system]
- IMPROVEI.2 X improves ≡ ‘The health of a sick person X improvesI.1a’
- [Jim is steadily improving]
- IMPROVEI.3 X improves at Y ≡ ‘X’s execution of Y improvesI.1a, which is caused1 by X’s having practiced or practicing Y’
- [Jim is steadily improving at algebra]
- IMPROVEII X improves Y by Z-ing ≡ ‘X voluntarily causes2 that the market value of a piece of real estate Y becomes higher by doing Z-ing to Y’
- [Jim improved his house by installing indoor plumbing]
- IMPROVEIII X improves upon Y ≡ ‘X creates a new Y´ by improvingI.1b Y’
- [Jim has drastically improved upon Patrick’s translation]
The lexicographic numbers (given in bold) reflect semantic distances (i.e., degree of similarity) between LUs within a vocable: Roman numerals mark the larger distances, while Arabic numerals mark smaller distances, and letters indicate the smallest distances. The four lexemes grouped under IMPROVEI are considered to be closer to each other than to IMPROVEII or IMPROVEIII, because they include ‘improveI.1a’. IMPROVEI.1a and IMPROVEI.1b are especially close because in English there are many pairs of words that are related by the semantic alternation ’P’ ~ ‘cause1 to P’—that is, labile or ambitransitive verbs.
The subscript and superscript numbers attached to words in the definition refer to subsenses (subscripts) and homophonous entries (superscripts) for a word as given in the Longman Dictionary of Contemporary English[19] —thus, “device11” refers to the first entry for device in this dictionary, first subsense.
Structure of the ECD entry
An ECD entry is a full description of a Lexical Unit L divided into three major sections or "zones":
The semantic zone
The semantic zone describes the semantic properties of L. It consists of two sub-zones: 1) the definition of L, which fully specifies L’s meaning, and 2) L’s connotations (meanings that the language associates with L, but that are not part of its definition).[20][21]
The phonological/graphematic zone
The phonological/graphematic zone gives all of the data on Ls phonological properties. Here again we find two sub-zones: 1) L’s pronunciation, including its syllabification, and any non-standard prosodic properties,[22] and 2) orthographic information about L’s spelling variants, etc.
The co-occurrence zone
The co-occurrence zone presents all of the data on Ls combinatorial properties. It is organized into five sub-zones—morphological, syntactic, lexical, stylistic, and pragmatic.
- The morphological sub-zone contains inflectional data including conjugation/declension class, irregular forms, missing forms, permitted alternations, etc.[23]
- The syntactic sub-zone has two parts:
- a) Government pattern, which describes the elements that L can syntactically govern (arguments, complements, etc.);
- b) Part of speech and syntactic features, which describes the constructions in which L can appear as a syntactic dependent.
- The lexical sub-zone specifies the lexical functions that L participates in, covering both semantic derivations and collocations of L with other individual Lexical Units or very small and irregular groups of lexical units.
- The stylistic sub-zone presents Usage Labels specifying, for the headword L, its speech register (informal, colloquial, vulgar, poetic, etc.), temporal (obsolescent, archaic) and geographical (British, Indian, Australian) variability, and the like.
- The pragmatic sub-zone describes the real-life situations in which a particular expression is appropriate or inappropriate.
References
- ^ Mel’čuk, Igor A. (2006). Explanatory Combinatorial Dictionary. In Giandomenico Sica (ed.), Open Problems in Linguistics and Lexicography, 225–355. Monza: Polimetrica
- ^ Žolkovskij, Aleksandr (1965). "O vozmožnom metode i instrumentax semantičeskogo sinteza [On a Possible Method and Tools for Semantic Synthesis]". Naučno-texničeskaja informacija. 5: 23–28.
{{cite journal}}
: Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - ^ Žolkovskij, Aleksandr (1966). "O sisteme semantičeskogo sinteza. I. Stroenie slovarja [On a System for Semantic Synthesis. I. Structure of the Dictionary]". Naučno-texničeskaja informacija. 11: 48–55.
{{cite journal}}
: Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - ^ Žolkovskij, Aleksandr (1967). "O semantičeskom sinteze [On Semantic Synthesis]". Problemy kibernetiki. 19: 177–238.
{{cite journal}}
: Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - ^ Apresjan, Jurij (1969). "Tolkovanie leksičeskix značenij kak problema teoretičeskoj semantiki [Definition of Lexical Meanings as a Problem of Theoretical Semantics]". Izvestija Akademii Nauk SSSR, Serija lit. i jazyka. 28: 11–23.
- ^ Apresjan, Jurij (1969). "O jazyke dlja opisanija značenij slov [On a Language for the Description of Lexical Meanings]". Izvestija Akademii Nauk SSSR, Serija lit. i jazyka. 28: 415–428.
- ^ Apresjan, Jurij (1974). Leksičeskaja semantika. Sinonimičeskie sredstva jazyka [Lexical Semantics. Synonymic Means of the Language]. Moscow: Nauka.
- ^ Apresjan, Jurij (1980). Tipy informacii dlja poverxnostno-semantičeskogo komponenta modeli Smysl ⇔ Tekst [Types of Information for the Surface-Semantic Component of the Meaning-Text Model]. Vienna: Wiener Slawistischer Almanach.
- ^ Apresjan, Juirj (1988). Karaulov, Jurij (ed.). "Morfologičeskaja informacija dlja tolkovogo slovarja [Morphological Information in a Monolingual Dictionary]". Slovarnye kategorii. Moscow: Nauka: 31–59.
{{cite journal}}
: Unknown parameter|name=
ignored (help) - ^ Apresjan, Jurij (1988). "Tipy kommunikativnoj informacii dlja tolkovogo slovarja [Types of Communicative Information for a Monolingual Dictionary]". Jazyk: sistema i funkcionirovanie. Moscow: Nauka: 10–22.
- ^ Apresjan, Jurij (1990). "Tipy leksikografičeskoj informacii ob označajuščem leksemy [Types of Lexicographic Information on a Lexeme's Signifier]". Tipologija i grammatika. Moscow: Nauka: 91–108.
{{cite journal}}
: Unknown parameter|name=
ignored (help) - ^ Apresjan, Jurij (1990). "Formal´naja model´ jazyka i predstavlenie leksikografičeskix znanij [A Formal Model of Language and Representation of Lexicographic Knowledge]". IVoprosy jazykoznanija. 6: 91–108.
- ^ Apresjan, Jurij (1995). Izbrannye trudy. Tom II. Integral´noe opisanie jazyka i sistemnaja leksikografija [Selected Writings. Vol II. An Integral Linguistic Description and Systemic Lexicography]. Moscow: Škola «Jazyki russkoj kul´tury».
- ^ Mel’čuk, Igor A. (1984). Explanatory Combinatorial Dictionary of Modern Russian. Semantico-syntactic Studies of Russian Vocabulary. Vienna: Wiener Slawistischer Almanach.
{{cite book}}
: Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - ^ Mel’čuk, Igor A. (1999). Dictionnaire explicatif et combinatoire du français contemporain. Recherches lexico-sémantiques IV. Montréal: Les Presses de l’Université de Montréal.
{{cite book}}
: Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - ^ Mel’čuk, Igor A. (2007). Lexique actif du français : L'apprentissage du vocabulaire fondé sur 20000 dérivations sémantiques et collocations du français. Paris: Duculot.
{{cite book}}
: Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - ^ Alonso Ramos, Margarita (2003). Fernández Montraveta, A., A. Martí Antonin & G. Vásquez García (ed.). "Hacia un diccionario de colocacionnes del español y su codificación". Lexicografía computacional y semántica. Barcelona: Universidad de Barcelona: 11–34.
{{cite journal}}
: CS1 maint: multiple names: editors list (link) - ^ Alonso Ramos, Margarita (2004). Bataner, P. & J. DeCesaris García (ed.). "Elaboración del Diccionario de colocaciones del español y sus aplicaciones". De lexicografia: Actes del I Symposium internacional de lexicografia, Barcelona: IULA: 149–162.
- ^ Longman Dictionary of Contemporary English. London: Longman. 1978.
- ^ Iordanskaja, Lida (1984). "Connotation en sémantique et lexicographie". Dictionnaire explicatif et combinatoire du français contemporain : Recherches lexico-sémantiques I. Montréal: Presses de l'Université de Montréal: 33–40.
{{cite journal}}
: Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - ^ Iordanskaja, Lida (2006). Berger, T., K. Gutschmidt, S. Kempgen & P. Kosta (ed.). "Connotation". The Slavic Languages: An International Handbook of their History, their Structure and their Investigation. New York: Walter de Gruyter.
{{cite journal}}
: Unknown parameter|coauthors=
ignored (|author=
suggested) (help)CS1 maint: multiple names: editors list (link) - ^ Apresjan, Jurij (1990). "Tipy leksikografičeskoj informacii ob označajuščem leksemy [Types of Lexicographic Information on a Lexeme's Signifier]". Tipologija i grammatika. Moscow: Nauka: 91–108.
{{cite journal}}
: Unknown parameter|name=
ignored (help) - ^ Apresjan, Juirj (1988). Karaulov, Jurij (ed.). "Morfologičeskaja informacija dlja tolkovogo slovarja [Morphological Information in a Monolingual Dictionary]". Slovarnye kategorii. Moscow: Nauka: 31–59.
{{cite journal}}
: Unknown parameter|name=
ignored (help)