Benutzer:Jorges/Computerlinguistik
Diese Seite ist als Arbeitsplattform für mein Computerlinguistik-Studium gedacht. Hier können die Hausaufgaben zum Kurs Computational Lexicography entstehen.
History
- Report on The Canoo online dictionary Final Version
Exercise 2
- Describe the differences between the so-called (sense) enumerative lexicon and the generative lexicon model.
Which lexicon (model) is more adequate:- for the human user
- for NLP?
- Which issues are raised by
- double forms like
- gut und gerne
- hale and hearty;
- idioms like
- spill the beans or
- jmdm reinen Wein einschenken (tell a bitter truth)
- for the compilers of traditional print dictionaries
- for researchers of the mental lexicon
- for designers of lexical resources for NLP systems?
- double forms like
Please use the "Vorschau"-button if you wan to preview your changes!
Differences between the so-called Enumerative Lexicon and the Generative Lexicon Model
Enumerative Lexicon
An enumerative lexicon is a "very simple" lexicon. In it the various meanings are saved that can occur in different contexts. Sometimes meanings are linked with some synonyms or idioms. With this linking operation the lexicon tries to define the different linguistic phenomena that occur by using e.g. adjectives in many contexts.
Generative lexicon
In a generative lexicon the syntactic and semantic features of the items are decomposed into parts. Those parts provide the lexical meaning. Several levels are necessary to represent the different meanings of a word:
- Argument structure: A lexical item needs a specific number and type of arguments
- Event structure: This structure sums up the occurance of events and how events and their subevents are related to each other.
- Qualia structure: Qualia carries the word meaning. That means it describes for example what kind an object is made of or it determines the purpose or function of an object.
- Inheritance structure shows how the word is related to other words in the lexicon
So both lexicons are adequate, but each in a special way and function. For a human user the enumerative lexicon would be better to use because he has common sense knowledge and therefore he can derive the specific meaning of an item from the number of given meanings.
For Natural Language Processing it is better to use a generative lexicon because the machine doesn't have the background knowledge to choose one of the meanings in a enumerative lexicon. The machine has to work out the context or the structure that's around the item in order to find out the right meaning.
Issues raised by double forms and idioms
Looking at the examples given we will assume that
- idioms are phrases whose meaning cannot be analyzed as a combination of the meanings of their components. They must be interpreted metaphorically.
- doubleforms are a kind of idioms and can be defined as two words that are typically used in combination. There are two different cases:
- each component is an independent word that can also appear alone. The double form might be used in different contexts though. Example: gut und gerne: "Das ist gut und gerne 20 Jahre her." as compared to: "Das ist gut 20 Jahre her." (different meaning) or "Das ist gerne 20 Jahre her." (unidiomatic)
- one (or both) components only appear in the double form. Example: hale in hale and hearty
for the compilers of traditional print dictionaries
Double forms
You have to decide where to list the double form. You can:
- make a new entry for it (if the meaning is very special)
- add it to the entry of word 1 or word 2 (usually to the word, that can only appear in the double form, if there is one)
- add it to the entries of both words (Probably this is not a good idea, as this redundance wastes space in your lexicon. It might be useful though to provide at least a reference to the double form from every component.)
Then you have to provide a definition, if the meaning differs from the usual meaning of the component and/or give examples of the context it is used in.
Idioms
To list an idiom, you have to decide to which lexical entry you want to assign it.
For this you have to decide which is the most prominent word in the phrase, i. e. where the human user would expect to find the idiom if he wants to look it up. Only content words can be used here and often the verb is chosen, e. g. spill for spill the beans. Then you have to paraphrase the metaphorical meaning.
for researchers of the mental lexicon
Similarily as above the question to answer would be how the meaning of double forms or idioms is saved in the mental lexicon:
- annotated to one of the words in the double form or idiom or
- as an own entry.
This could be done for example by comparing the reading time between idioms and other phrases with the same length and structure.
for designers of lexical resources for NLP systems
Double forms
First you have to decide whether is is necessary to list the double form, i. e. is the combination stable enough and does it really have a different meaning compared to the single word meanings. Depending on how you want to use your NLP-System, you might find it less important to consider double forms that only have stylistic value as long as you can parse them with your normal syntactic rules.
However, for words in a double form that do not appear alone, it is absolutely necessary to have a lexical entry that links them to their partner and makes their use ungrammatical, if not combined with the partner. The simplest way to solve this problem is having 'frozen' phrases in the lexicon, e. g. 'hale and hearty' as one entry that can only appear in this form and may have its own semantic interpretation. The same is necessary for double forms whose meaning differs significantly from the meaning of their components.
Idioms
Treating idioms becomes only necessary if you want to provide a semantic interpretation, as they can be analyzed syntactically with the normal rules.
If this is the case, you have to bypass the normal strategy of semantic interpretation that builds the phrase meaning from the meaning of the components. So you have to recognize an idiom when it appears, e. g. by checking if a specific verb appears with specific arguments that form an idiom. Then you have to assign a new meaning to the phrase.
There are two additional problems here:
- You can never be sure if the phrase is really used as an idiom. Depending on the context, it can always have its verbatim meaning (i. e. spill the beans: someone really drops vegetables). You have to rely more or less on probability here, though you can make your system consider the semantic context to improve the results.
- As idioms are formed like normal phrases, they can also be modified. E. g.: einen Bock schießen = to make a mistake; einen kapitalen Bock schießen = to make a big mistake. This kind of modification can become quite complex and has to be interpreted in the context of metaphorical meaning. (For the traditional dictionary compiler, this is a small problem, as human users are flexible enough to understand this kind of modifications once they know the basic metaphor.)