Kleene's algorithm

In theoretical computer science, in particular in formal language theory, Kleene's algorithm transforms a given deterministic finite automaton (DFA) into a regular expression. Together with other conversion algorithms, it establishes the equivalence of several description formats for regular languages.

Algorithm description

According to Gross and Yellen (2004),^[1] the algorithm can be traced back to Kleene (1956).^[2]

This description follows Hopcroft and Ullman (1979).^[3] Given a deterministic finite automaton M = (Q, Σ, δ, q₀, F), with Q = { q₀,...,q_n } its set of states, the algorithm computes

the sets R^k
_ij of all strings that take M from state q_i to q_j without going through any state numbered higher than k.

Here, "going through a state" means entering and leaving it, so both i and j may be higher than k, but no intermediate state may. Each set R^k
_ij is represented by a regular expression; the algorithm computes them step by step for k = -1, 0, ..., n. Since there is no state numbered higher than n, the regular expression Rⁿ
_0j represents the set of all strings that take M from its start state q₀ to q_j. If F = { q₁,...,q_f } is the set of accept states, the regular expression Rⁿ
₀₁ | ... | Rⁿ
_0f represents the language accepted by M.

The initial regular expressions, for k = -1, are computed as

R⁻¹
_ij = a₁ | ... | a_m if i≠j, where δ(q_i,a₁) = ... = δ(q_i,a_m) = q_j

R⁻¹
_ij = a₁ | ... | a_m | ε, if i=j, where δ(q_i,a₁) = ... = δ(q_i,a_m) = q_j

After that, in each step the expressions R^k
_ij are computed from the previous ones by

R^k
_ij = R^k-1
_ik (R^k-1
_kk)^* R^k-1
_kj | R^k-1
_ij

By induction on k, it can be shown that the length^[4] of each expression R^k
_ij is at most ⁠4^k+1(6s+7) - 4/3⁠ symbols, where s denotes the number of characters in Σ. Therefore, the length of the regular expression representing the language accepted by M is at most ⁠4ⁿ⁺¹(6s+7)f - f - 3/3⁠ symbols, where f denotes the number of final states. This exponential blowup is inevitable, because there exist families of DFAs for which any equivalent regular expression must be of exponential size.^[5]

In practice, the size of the regular expression obtained by running the algorithm can be very different depending on the order in which the states are considered by the procedure, i.e., the order in which they are numbered from 0 to n.

Example

The automaton shown in the picture can be described as M = (Q, Σ, δ, q₀, F) with

the set of states Q = { q₀, q₁, q₂ },
the input alphabet Σ = { a, b },
the transition function δ with δ(q₀,a)=q₀, δ(q₀,b)=q₁, δ(q₁,a)=q₂, δ(q₁,b)=q₁, δ(q₂,a)=q₁, and δ(q₂,b)=q₁,
the start state q₀, and
set of accept states F = { q₁ }.

Kleene's algorithm computes the initial regular expressions as

R⁻¹ ₀₀	= a \| ε
R⁻¹ ₀₁	= b
R⁻¹ ₀₂	= ∅
R⁻¹ ₁₀	= ∅
R⁻¹ ₁₁	= b \| ε
R⁻¹ ₁₂	= a
R⁻¹ ₂₀	= ∅
R⁻¹ ₂₁	= a \| b
R⁻¹ ₂₂	= ε

After that, the R^k
_ij are computed from the R^k-1
_ij step by step for k = 0, 1, 2. Kleene algebra equalities are used to simplify the regular expressions as much as possible.

Step 0

R⁰ ₀₀	= R⁻¹ ₀₀ (R⁻¹ ₀₀)^* R⁻¹ ₀₀ \| R⁻¹ ₀₀	= (a \| ε)	(a \| ε)^*	(a \| ε)	\| a \| ε	= a^*
R⁰ ₀₁	= R⁻¹ ₀₀ (R⁻¹ ₀₀)^* R⁻¹ ₀₁ \| R⁻¹ ₀₁	= (a \| ε)	(a \| ε)^*	b	\| b	= a^* b
R⁰ ₀₂	= R⁻¹ ₀₀ (R⁻¹ ₀₀)^* R⁻¹ ₀₂ \| R⁻¹ ₀₂	= (a \| ε)	(a \| ε)^*	∅	\| ∅	= ∅
R⁰ ₁₀	= R⁻¹ ₁₀ (R⁻¹ ₀₀)^* R⁻¹ ₀₀ \| R⁻¹ ₁₀	= ∅	(a \| ε)^*	(a \| ε)	\| ∅	= ∅
R⁰ ₁₁	= R⁻¹ ₁₀ (R⁻¹ ₀₀)^* R⁻¹ ₀₁ \| R⁻¹ ₁₁	= ∅	(a \| ε)^*	b	\| b \| ε	= b \| ε
R⁰ ₁₂	= R⁻¹ ₁₀ (R⁻¹ ₀₀)^* R⁻¹ ₀₂ \| R⁻¹ ₁₂	= ∅	(a \| ε)^*	∅	\| a	= a
R⁰ ₂₀	= R⁻¹ ₂₀ (R⁻¹ ₀₀)^* R⁻¹ ₀₀ \| R⁻¹ ₂₀	= ∅	(a \| ε)^*	(a \| ε)	\| ∅	= ∅
R⁰ ₂₁	= R⁻¹ ₂₀ (R⁻¹ ₀₀)^* R⁻¹ ₀₁ \| R⁻¹ ₂₁	= ∅	(a \| ε)^*	b	\| a \| b	= a \| b
R⁰ ₂₂	= R⁻¹ ₂₀ (R⁻¹ ₀₀)^* R⁻¹ ₀₂ \| R⁻¹ ₂₂	= ∅	(a \| ε)^*	∅	\| ε	= ε

Step 1

R¹ ₀₀	= R⁰ ₀₁ (R⁰ ₁₁)^* R⁰ ₁₀ \| R⁰ ₀₀	= a^*b	(b \| ε)^*	∅	\| a^*	= a^*
R¹ ₀₁	= R⁰ ₀₁ (R⁰ ₁₁)^* R⁰ ₁₁ \| R⁰ ₀₁	= a^*b	(b \| ε)^*	(b \| ε)	\| a^* b	= a^* b^* b
R¹ ₀₂	= R⁰ ₀₁ (R⁰ ₁₁)^* R⁰ ₁₂ \| R⁰ ₀₂	= a^*b	(b \| ε)^*	a	\| ∅	= a^* b^* ba
R¹ ₁₀	= R⁰ ₁₁ (R⁰ ₁₁)^* R⁰ ₁₀ \| R⁰ ₁₀	= (b \| ε)	(b \| ε)^*	∅	\| ∅	= ∅
R¹ ₁₁	= R⁰ ₁₁ (R⁰ ₁₁)^* R⁰ ₁₁ \| R⁰ ₁₁	= (b \| ε)	(b \| ε)^*	(b \| ε)	\| b \| ε	= b^*
R¹ ₁₂	= R⁰ ₁₁ (R⁰ ₁₁)^* R⁰ ₁₂ \| R⁰ ₁₂	= (b \| ε)	(b \| ε)^*	a	\| a	= b^* a
R¹ ₂₀	= R⁰ ₂₁ (R⁰ ₁₁)^* R⁰ ₁₀ \| R⁰ ₂₀	= (a \| b)	(b \| ε)^*	∅	\| ∅	= ∅
R¹ ₂₁	= R⁰ ₂₁ (R⁰ ₁₁)^* R⁰ ₁₁ \| R⁰ ₂₁	= (a \| b)	(b \| ε)^*	(b \| ε)	\| a \| b	= (a \| b) b^*
R¹ ₂₂	= R⁰ ₂₁ (R⁰ ₁₁)^* R⁰ ₁₂ \| R⁰ ₂₂	= (a \| b)	(b \| ε)^*	a	\| ε	= (a \| b) b^* a \| ε

Step 2

R² ₀₀	= R¹ ₀₂ (R¹ ₂₂)^* R¹ ₂₀ \| R¹ ₀₀	= a^b^ba	((a\|b)b^a \| ε)^	∅	\| a^*	= a^*
R² ₀₁	= R¹ ₀₂ (R¹ ₂₂)^* R¹ ₂₁ \| R¹ ₀₁	= a^b^ba	((a\|b)b^a \| ε)^	(a\|b)b^*	\| a^* b^* b	= a^* b (a (a \| b) \| b)^*
R² ₀₂	= R¹ ₀₂ (R¹ ₂₂)^* R¹ ₂₂ \| R¹ ₀₂	= a^b^ba	((a\|b)b^a \| ε)^	((a\|b)b^*a \| ε)	\| a^* b^* ba	= a^* b^* b (a (a \| b) b^)^ a
R² ₁₀	= R¹ ₁₂ (R¹ ₂₂)^* R¹ ₂₀ \| R¹ ₁₀	= b^* a	((a\|b)b^a \| ε)^	∅	\| ∅	= ∅
R² ₁₁	= R¹ ₁₂ (R¹ ₂₂)^* R¹ ₂₁ \| R¹ ₁₁	= b^* a	((a\|b)b^a \| ε)^	(a\|b)b^*	\| b^*	= (a (a \| b) \| b)^*
R² ₁₂	= R¹ ₁₂ (R¹ ₂₂)^* R¹ ₂₂ \| R¹ ₁₂	= b^* a	((a\|b)b^a \| ε)^	((a\|b)b^*a \| ε)	\| b^* a	= (a (a \| b) \| b)^* a
R² ₂₀	= R¹ ₂₂ (R¹ ₂₂)^* R¹ ₂₀ \| R¹ ₂₀	= ((a\|b)b^*a \| ε)	((a\|b)b^a \| ε)^	∅	\| ∅	= ∅
R² ₂₁	= R¹ ₂₂ (R¹ ₂₂)^* R¹ ₂₁ \| R¹ ₂₁	= ((a\|b)b^*a \| ε)	((a\|b)b^a \| ε)^	(a\|b)b^*	\| (a \| b) b^*	= (a \| b) (a (a \| b) \| b)^*
R² ₂₂	= R¹ ₂₂ (R¹ ₂₂)^* R¹ ₂₂ \| R¹ ₂₂	= ((a\|b)b^*a \| ε)	((a\|b)b^a \| ε)^	((a\|b)b^*a \| ε)	\| (a \| b) b^* a \| ε	= ((a \| b) b^* a)^*

Since q₀ is the start state and q₁ is the only accept state, the regular expression R²
₀₁ denotes the set of all strings accepted by the automaton.

References

^ Jonathan L. Gross and Jay Yellen, ed. (2004). Handbook of Graph Theory. Discrete Mathematics and it Applications. CRC Press. ISBN 1-58488-090-2. Here: sect.2.1, remark R13 on p.65
^ Kleene, Stephen C. (1956). "Representation of Events in Nerve Nets and Finite Automate" (PDF). Automata Studies, Annals of Math. Studies. 34. Princeton Univ. Press. Here: sect.9, p.37-40
^ John E. Hopcroft, Jeffrey D. Ullman (1979). Introduction to Automata Theory, Languages, and Computation. Addison-Wesley. ISBN 0-201-02988-X. Here: Theorem 2.4, p.33-34
^ More precisely, the number of regular-expression symbols, "a_i", "ε", "|", "^*", "·"; not counting parantheses.
^ Gruber, Hermann; Holzer, Markus (2008). Aceto, Luca; Damgård, Ivan; Goldberg, Leslie Ann; Halldórsson, Magnús M.; Ingólfsdóttir, Anna; Walukiewicz, Igor (eds.). "Finite Automata, Digraph Connectivity, and Regular Expression Size". Automata, Languages and Programming. Lecture Notes in Computer Science. Springer Berlin Heidelberg: 39–50. doi:10.1007/978-3-540-70583-3_4. ISBN 9783540705833.. Theorem 16.

[1] Jonathan L. Gross and Jay Yellen, ed. (2004). Handbook of Graph Theory. Discrete Mathematics and it Applications. CRC Press. ISBN 1-58488-090-2. Here: sect.2.1, remark R13 on p.65

[2] Kleene, Stephen C. (1956). "Representation of Events in Nerve Nets and Finite Automate" (PDF). Automata Studies, Annals of Math. Studies. 34. Princeton Univ. Press. Here: sect.9, p.37-40

[3] John E. Hopcroft, Jeffrey D. Ullman (1979). Introduction to Automata Theory, Languages, and Computation. Addison-Wesley. ISBN 0-201-02988-X. Here: Theorem 2.4, p.33-34

[4] More precisely, the number of regular-expression symbols, "a_i", "ε", "|", "^*", "·"; not counting parantheses.

[5] Gruber, Hermann; Holzer, Markus (2008). Aceto, Luca; Damgård, Ivan; Goldberg, Leslie Ann; Halldórsson, Magnús M.; Ingólfsdóttir, Anna; Walukiewicz, Igor (eds.). "Finite Automata, Digraph Connectivity, and Regular Expression Size". Automata, Languages and Programming. Lecture Notes in Computer Science. Springer Berlin Heidelberg: 39–50. doi:10.1007/978-3-540-70583-3_4. ISBN 9783540705833.. Theorem 16.

[1]

[2]

[3]

[4]

[5]

R¹ ₀₀	= R⁰ ₀₁ (R⁰ ₁₁)^* R⁰ ₁₀ \| R⁰ ₀₀	= a^*b	(b \| ε)^*	∅	\| a^*	= a^*
R¹ ₀₁	= R⁰ ₀₁ (R⁰ ₁₁)^* R⁰ ₁₁ \| R⁰ ₀₁	= a^*b	(b \| ε)^*	(b \| ε)	\| a^* b	= a^* b^* b
R¹ ₀₂	= R⁰ ₀₁ (R⁰ ₁₁)^* R⁰ ₁₂ \| R⁰ ₀₂	= a^*b	(b \| ε)^*	a	\| ∅	= a^* b^* ba
R¹ ₁₀	= R⁰ ₁₁ (R⁰ ₁₁)^* R⁰ ₁₀ \| R⁰ ₁₀	= (b \| ε)	(b \| ε)^*	∅	\| ∅	= ∅
R¹ ₁₁	= R⁰ ₁₁ (R⁰ ₁₁)^* R⁰ ₁₁ \| R⁰ ₁₁	= (b \| ε)	(b \| ε)^*	(b \| ε)	\| b \| ε	= b^*
R¹ ₁₂	= R⁰ ₁₁ (R⁰ ₁₁)^* R⁰ ₁₂ \| R⁰ ₁₂	= (b \| ε)	(b \| ε)^*	a	\| a	= b^* a
R¹ ₂₀	= R⁰ ₂₁ (R⁰ ₁₁)^* R⁰ ₁₀ \| R⁰ ₂₀	= (a \| b)	(b \| ε)^*	∅	\| ∅	= ∅
R¹ ₂₁	= R⁰ ₂₁ (R⁰ ₁₁)^* R⁰ ₁₁ \| R⁰ ₂₁	= (a \| b)	(b \| ε)^*	(b \| ε)	\| a \| b	= (a \| b) b^*
R¹ ₂₂	= R⁰ ₂₁ (R⁰ ₁₁)^* R⁰ ₁₂ \| R⁰ ₂₂	= (a \| b)	(b \| ε)^*	a	\| ε	= (a \| b) b^* a \| ε

R² ₀₀	= R¹ ₀₂ (R¹ ₂₂)^* R¹ ₂₀ \| R¹ ₀₀	= a^b^ba	((a\|b)b^a \| ε)^	∅	\| a^*	= a^*
R² ₀₁	= R¹ ₀₂ (R¹ ₂₂)^* R¹ ₂₁ \| R¹ ₀₁	= a^b^ba	((a\|b)b^a \| ε)^	(a\|b)b^*	\| a^* b^* b	= a^* b (a (a \| b) \| b)^*
R² ₀₂	= R¹ ₀₂ (R¹ ₂₂)^* R¹ ₂₂ \| R¹ ₀₂	= a^b^ba	((a\|b)b^a \| ε)^	((a\|b)b^*a \| ε)	\| a^* b^* ba	= a^* b^* b (a (a \| b) b^)^ a
R² ₁₀	= R¹ ₁₂ (R¹ ₂₂)^* R¹ ₂₀ \| R¹ ₁₀	= b^* a	((a\|b)b^a \| ε)^	∅	\| ∅	= ∅
R² ₁₁	= R¹ ₁₂ (R¹ ₂₂)^* R¹ ₂₁ \| R¹ ₁₁	= b^* a	((a\|b)b^a \| ε)^	(a\|b)b^*	\| b^*	= (a (a \| b) \| b)^*
R² ₁₂	= R¹ ₁₂ (R¹ ₂₂)^* R¹ ₂₂ \| R¹ ₁₂	= b^* a	((a\|b)b^a \| ε)^	((a\|b)b^*a \| ε)	\| b^* a	= (a (a \| b) \| b)^* a
R² ₂₀	= R¹ ₂₂ (R¹ ₂₂)^* R¹ ₂₀ \| R¹ ₂₀	= ((a\|b)b^*a \| ε)	((a\|b)b^a \| ε)^	∅	\| ∅	= ∅
R² ₂₁	= R¹ ₂₂ (R¹ ₂₂)^* R¹ ₂₁ \| R¹ ₂₁	= ((a\|b)b^*a \| ε)	((a\|b)b^a \| ε)^	(a\|b)b^*	\| (a \| b) b^*	= (a \| b) (a (a \| b) \| b)^*
R² ₂₂	= R¹ ₂₂ (R¹ ₂₂)^* R¹ ₂₂ \| R¹ ₂₂	= ((a\|b)b^*a \| ε)	((a\|b)b^a \| ε)^	((a\|b)b^*a \| ε)	\| (a \| b) b^* a \| ε	= ((a \| b) b^* a)^*

Algorithm description

Example

See also

References