Jump to content

Glushkov's construction algorithm

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by MuhammadUmer.MC12 (talk | contribs) at 17:20, 15 November 2015. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computer science theory, particularly formal language theory, the Glushkov Construction Algorithm (GCA) transforms a given regular expression into an equivalent, nondeterministic finite automaton (NFA). It thus forms avcn dhvn, which transforms a finite automaton into a regular expression. The automaton obtained by the Glushkov's construction is the same as the one obtained by Thompson's construction algorithm once their ε-transition is removed.

Construction

The construction creates, given a regular expression , a nondeterministic automaton which accepts the language accepted by .[1] The construction uses four steps:

1.- Linearisation of the expression. Each letter of the alphabet appearing in the expression are renamed, so that each letters occurs at most once in the new expression. Let be the old alphabet and let be the new one.

2a.- Computation of the sets , , and , where is the linearized version of . The first, , is the set of letters which occurs as first letter of a word of , the second, , is the set of letters which can ends a letter of . The last one is the set of pairs of letters which can occurs in words of , that is, the set of factors of length two of the words of . Those sets are mathematically defined by

, love people
et . They are computed by induction over the structure of the expression, as explained below, but they are a function of the language and not of the expression.

2b.- Computation of the set which contains the empty-word if this word belongs to , and is the empty-set otherwise. Formally

, where is the empty-word.

3.- Computation of the local language, as defined by , , and . By definition, the local language defined by the sets , , and is the set of words which begin by a letter of , ends by a letter of , and whose factors of length 2 belongs to ; that is, it is the language:

,

potentially with the empty word.

The computation of the automaton for the local language denoted by this linearised expression, is, formally, Glushkov's construction. The construction of the automaton can be done using classical construction operation: concatenation, interesection and itering an automaton.

4.- Erasing the linearisation, giving to each letter of the letter of it used to be.

Example

Automate par construction de Gluskkov - version linéaire
Automate par construction de Gluskkov - version finale

Let us consider[1] the rational expression

.

1.- The linearized version is

.

The letters have been linearized by appending an indice to them.

2.- The sets , , and of the first letters, last letters, and factors of length 2 for the linear expression are respectively

, and .

The empty word belongs to the language, hence .

3.- The automaton of the local language

contains an initial state, denoted 1, and a state for each of the 5 letters of the alphabet . There is a transition from 1 to the two states of , a transition from to if is in , and the three states of are final, and such is the state 1. All transitions to a letter have as label the letter .

4.- Obtening the automaton for by deleting the indices.

Computation of the set of letters

The computation of the sets , , and is done inductively over the expression. We must give the values for 0, 1 (the symbols for the empty language and the singleton language containing the empty-word), the letters and the results of the operations .

1.- For , one has

, and for all letters ,

then

and then

2.- For , one has

and for each letters ,

then

and finally .

The same formulas are also correct for , apart from the product where

.

3.- For the set of factors of length 2, one has

for all letters ,

then

and finally .

The most costly operations are the products of sets for the computation of .

Properties

The obtained automaton is non-deterministic, and it has many state as the number of letters of the rational expression, plus one. Furthermore, it has been shown [2] that Glushkov's automaton is the same than Thompson's automaton when the ε-transitions are removed.

Applications and deterministic expressions

The computation of the automaton by the expression occurs often, it has been systematically used in search function, in particular by the commands grep of Unix. Similarly, XML's specification also use those constructions; for more efficiency, regular expression of a certain kind, called deterministic expression have been studied.[2][3]

Notes and references

  1. ^ a b Pin 2010.
  2. ^ a b Sakarovitch 2003, p. 215.
  3. ^ Brüggemann-Klein, Anne (1993). "Regular Expressions into Finite Automata". Theoretical Computer Science. 120 (2): 197–213. doi:10.1016/0304-3975(93)90287-4..