Parikh's theorem
Parikh's theorem in theoretical computer science says that if we look only at the relative number of occurrences of terminal symbols in a context-free language, without regard to their order, then the language is indistinguishable from a regular language.[1] It is useful for deciding whether or not a string with given number of some terminals is accepted by a context-free grammar.[2] It was first proved by Rohit Parikh in 1961[3] and republished in 1966.[4]
Definitions
Let be an alphabet. The Parikh vector of a word is defined as the function , given by[1]
, where gives the number of occurrences of the letter in the word .
Further, for a language ,
A subset of is said to be linear if it is of the form
for some vectors .
A subset of is said to be semi-linear if it is a union of finitely many linear subsets.
Significance
Parikh's theorem proves that some context-free languages can only have ambiguous grammars. Such languages are called inherently ambiguous. From a formal grammar perspective this means that some ambiguous context-free grammars cannot be converted to an eqivalent unambiguous context-free grammar.
References
- ^ a b Kozen, Dexter (1997). Automata and Computability. New York: Springer-Verlag. ISBN 3-540-78105-6.
- ^ Håkan Lindqvist. "Parikh's theorem" (PDF). Umeå Universitet.
- ^ Parikh, Rohit (1961). "Language Generating Devices". Quartly Progress Report, Research Laboratory of Electronics, MIT.
- ^ Parikh, Rohit (1966). "On Context-Free Languages". Journal of the Association for Computing Machinery. 13 (4).