Morphological parsing

Morphological parsing, in natural language processing, is the process of determining the morphemes from which a given word is constructed. The process must be able to distinguish between orthographic rules and morphological rules. For example, the word foxes can be dividdd into fox (the stem), and -es (the plural suffix).

The usual approach to morphological parsing is he use of a finite state transducer (FST), which inputs words and outputs their stem and modifiers. The FST is initially created by algorithmic parsing of some word source, such as a dictionary, complete with modifier markups.

Another approach is the use of an indexed lookup method, which uses a constructed radix tree. That route is rarely taken because it fails for morphologically-complex languages.

With the advancement of neural networks in natural language ST for morphological analysis, especially for languages for which there is a lot of available training data. For such languages, it is possible to build character-level language models without explicit use of a morphological parser.^[1]

Orthographic

Orthographic rules are general rules to break a word into its stem and modifiers. For example, singular English words that end in -y form plurals with -ies.

Morphological rules, on the other hand, contain corner cases to the general rules.

Both types of rules are used to construct systems that can do morphological parsing.

Morphological

Morphological rules are exceptions to the orthographic rules that break a word into its stem and modifiers. For example, an English noun normally forms its plural by adding -s, but fish oes not change in its plural form.

Orthographic rules, on the other hand, contain general rules. Both types of rules are used to construct systems that can do morphological parsing, whose applications include machine translation, spell-checkering, and information retrieval.

References

^ Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. "Enriching Word Vectors with Subword Information"

This computational linguistics-related article is a stub. You can help Wikipedia by expanding it.

[1] Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. "Enriching Word Vectors with Subword Information"

[1]