Jump to content

Pattern language (formal languages)

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Jochen Burghardt (talk | contribs) at 10:43, 1 November 2013 (Definition: fleshed out). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In theoretical computer science, a pattern language is a formal language that can be defined as the set of all particular instances of a string of constants and variables. Pattern Languages were introduced by Dana Angluin in the context of machine learning. [1]

Definition

Given a finite set Σ of constant symbols and a countable set X of variable symbols disjoint from Σ, a pattern is a finite non-empty string of symbols from Σ∪X. The length of a pattern p, denoted by |p|, is just the number of its symbols. The set of all patterns of length n is denoted by Pn, the set of all patterns at all by P*. A substitution is a mapping f: P*P* such that

  • f is a homomorphism with respect to string concatenation (⋅), formally: ∀p,qP*. f(pq) = f(p)⋅f(q);
  • f is non-erasing, formally: ∀pP*. f(p) ≠ ε, where ε denotes the empty string; and
  • f respects constants, formally: ∀s∈Σ. f(s) = s.

If p = f(q) for some patterns p, qP* and some substitution f, then p is said to be less general than q, written pq; in that case, necessarily |p| ≥ |q| holds. For a pattern p, its language is defined as the set of all less general patterns that are built from constants only, formally: L(p) = { s ∈ Σ+ : sp }, where Σ+ denotes the set of all finite non-empty string of symbols from Σ.

For example, using the constants Σ = { 0, 1 } and the variables X = { x, y, z, ... }, the pattern 0x10xx1 and xxy has length 7 and 3, respectively. An instance of the former pattern is 00z100z0z1 and 01z101z1z1, it is obtained by the substitution that maps x to 0z and to 1z, respectively, and each other symbol to itself. The language of the pattern xx is the set of all strings obtainable by concatenating a bit string with itself, e.g. 00, 11, 0101, 1010, 11101110 ∈ L(xx).

Properties

Learning patterns

References

  1. ^ Dana Angluin (1980). "Finding Patterns Common to a Set of Strings" (PDF). Journal of Computer and System Sciences. 21: 46–62.