Dictionary-based machine translation
Machine translation can use a method based on dictionary entries, which means that the words will be translated as a dictionary does – word by word, usually without much correlation of meaning between them. Dictionary lookups may be done with or without morphological analysis or lemmatisation. While this approach to machine translation is probably the least sophisticated, dictionary-based machine translation is ideally suitable for the translation of long lists of phrases on the subsentential (i.e., not a full sentence) level, e.g. inventories or simple catalogs of products and services.[1]
It can also be used to expedite manual translation, if the person carrying it out is fluent in both languages and therefore capable of correcting syntax and grammar.
LMT:
LMT[2] is a Prolog-based machine-translation system that works on specially made bilingual dictionaries, such as the Collins English-German (CEG), which have been rewritten in an indexed form which is easily readable by computers. This method uses a structured lexical data base (LDB) in order to correctly identify word categories from the source language, thus constructing a coherent sentence in the target language, based on rudimentary morphological analysis. This system uses “frames”[2] to identify the position a certain word should have, from a syntactical point of view, in a sentence. This “frames”[2] are mapped via language conventions, such as UDICT in the case of English.
In its early (prototype) form LMT[2] uses three lexicons, accessed simultaneously: source, transfer and target, although it is possible to encapsulate this whole information in a single lexicon. The program uses a lexical configuration consisting of two main elements. The first element is a hand-coded lexicon addendum which contains possible incorrect translations. The second element consist of various bilingual and monolingual dictionaries regarding the two languages which are the source and target languages.
Example-Based & Dictionary-Based Machine Translation
This method of Dictionary-Based Machine translation explores a different paradigm from systems such as LMT. An Example-Based machine translation system is supplied with only a “sentence-aligned bilingual corpus”[3]. Using this data the translating program generates a “word-for-word bilingual dictionary”[3] which is used for further translation.
Whilst this system would generally be regarded as a whole different way of machine translation than Dictionary-Based Machine Translation, it is important to understand the complementing nature of this paradigms. With the combined power inherent in both systems, coupled with the fact that a Dictionary-Based Machine Translation works best with a “word-for-word bilingual dictionary”[3] lists of words it demonstrates the fact that a coupling of this two translation engines would generate a very powerful translation tool that is, besides being semantically accurate, capable of enhancing its own functionalities via perpetual feedback loops.
A system which combines both paradigms in a way similar to what was described in the previous paragraph is the Pangloss Example-Based Machine Translation engine (PanEBMT)[3] machine translation engine. PanEBMT uses a correspondence table between languages to create its corpus. Furthermore, PanEBMT supports multiple incremental operations on its corpus, which facilitates a biased translation used for filtering purposes.
Parallel Text Processing
Douglas Hofstadter through his “Le Ton beau de Marot: In Praise of the Music of Language” proves what a complex task translation is. The author produced and analysed dozens upon dozens of possible translations for an eighteen line French poem, thus revealing complex inner workings of syntax, morphology and meaning[4]. Unlike most translation engines who choose a single translation based on back to back comparison of the texts in both the source and target languages, Douglas Hofstadter’s work prove the inherent level of error which is present in any form of translation, when the meaning of the source text is too detailed or complex. Thus the problem of text alignment and “statistics of language”[4] is brought to attention.
This discrepancies led to Martin Kay’s views on translation and translation engines as a whole. As Kay puts it “More substantial successes in these enterprises will require a sharper image of the world than any that can be made out simply from the statistics of language use” [(page xvii) Parallel Text Processing: Alignment and Use of Translation Corpora][4]. Thus Kay has brought back to light the question of meaning inside language and the distortion of meaning through processes of translation.
See also
- Example-based machine translation
- Language industry
- Machine translation
- Statistical machine translation
Bibliography
- ^ Uwe Muegge (2006), "An Excellent Application for Crummy Machine Translation: Automatic Translation of a Large Database", in Elisabeth Gräfe (2006; ed.), Proceedings of the Annual Conference of the German Society of Technical Communicators, Stuttgart: tekom, 18-21.
- ^ a b c d Mary S. Neff Michael C. McCord. "ACQUIRING LEXICAL DATA FROM MACHINE-READABLE DICTIONARY RESOURCES FOR MACHINE TRANSLATION". IBM T. J. Watson Research Center, P. O. Box 704, Yorktown Heights, New York 10598. Retrieved 2 November 2015.
{{cite web}}
: line feed character in|title=
at position 23 (help) - ^ a b c d Ralf D. Brown. "Automated Dictionary Extraction for "Knowledge-Free" Example-Based Translation" (PDF). Language Technologies Institute
(Center for Machine Translation)
Carnegie Mellon University
Pittsburgh, PA 15213-3890 USA. Retrieved 2 November 2015.
{{cite web}}
: line feed character in|publisher=
at position 32 (help); line feed character in|title=
at position 53 (help) - ^ a b c Jean V´eronis. "Parallel Text Processing: Alignment and Use of Translation Corpora". Dordrecht: Kluwer Academic
Publishers (Text, speech and language
technology series, edited by Nancy Ide
and Jean V´eronis, volume 13), 2000,
xxiii+402 pp; hardbound, ISBN
0-7923-6546-1. Retrieved 2 November 2015.
{{cite web}}
: line feed character in|publisher=
at position 27 (help)