Jump to content

Pattern language (formal languages)

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by PamD (talk | contribs) at 08:15, 2 November 2013 (not a stub). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In theoretical computer science, a pattern language is a formal language that can be defined as the set of all particular instances of a string of constants and variables. Pattern Languages were introduced by Dana Angluin in the context of machine learning. [1]

Definition

Given a finite set Σ of constant symbols and a countable set X of variable symbols disjoint from Σ, a pattern is a finite non-empty string of symbols from Σ∪X. The length of a pattern p, denoted by |p|, is just the number of its symbols. The set of all patterns containing exactly n distinct variables (each of which may occur several times) is denoted by Pn, the set of all patterns at all by P*. A substitution is a mapping f: P*P* such that

  • f is a homomorphism with respect to string concatenation (⋅), formally: ∀p,qP*. f(pq) = f(p)⋅f(q);
  • f is non-erasing, formally: ∀pP*. f(p) ≠ ε, where ε denotes the empty string; and
  • f respects constants, formally: ∀s∈Σ. f(s) = s.

If p = f(q) for some patterns p, qP* and some substitution f, then p is said to be less general than q, written pq; in that case, necessarily |p| ≥ |q| holds. For a pattern p, its language is defined as the set of all less general patterns that are built from constants only, formally: L(p) = { s ∈ Σ+ : sp }, where Σ+ denotes the set of all finite non-empty string of symbols from Σ.

For example, using the constants Σ = { 0, 1 } and the variables X = { x, y, z, ... }, the pattern 0x10xx1 and xxy has length 7 and 3, respectively. An instance of the former pattern is 00z100z0z1 and 01z101z1z1, it is obtained by the substitution that maps x to 0z and to 1z, respectively, and each other symbol to itself. Both 00z100z0z1 and 01z101z1z1 are also instances of xxy. The language of the pattern x0 is the set of all bit strings which denote an even number. The language of xx is the set of all strings obtainable by concatenating a bit string with itself, e.g. 00, 11, 0101, 1010, 11101110 ∈ L(xx).

Properties

The problem of deciding whether sL(p) for an arbitrary string s ∈ Σ+ and pattern p is NP-complete, and so is hence the problem of deciding pq for arbitrary patterns p, q.[2] The class of pattern languages is incomparable with the class of regular languages and with the class of context-free languages. The class of pattern languages is not closed under any of these operations: union, complement, intersection, Kleene plus, homomorphism, or inverse homomorphism. It is closed under concatenation and reversal: L(p)⋅L(q) = L(pq) and L(p)rev = L(prev).[3]

Learning patterns

The class of pattern languages can be identified in the limit from positive examples.[4]

References

  1. ^ Dana Angluin (1980). "Finding Patterns Common to a Set of Strings" (PDF). Journal of Computer and System Sciences. 21: 46–62.
  2. ^ Theorem 3.6, p.50; Corollary 3.7, p.52
  3. ^ Theorem 3.10, p.53
  4. ^ Dana Angluin (1980). "Inductive Inference of Formal Languages from Positive Data" (PDF). Information and Control. 45: 117–135.; here: Example 1, p.125