Pattern language (formal languages)
In theoretical computer science, a pattern language is a formal language that can be defined as the set of all particular instances of a string of constants and variables. Pattern Languages were introduced by Dana Angluin in the context of machine learning. [1]
Definition
Given a finite set Σ of constant symbols and a countable set X of variable symbols disjoint from Σ, a pattern is a finite non-empty string of symbols from Σ∪X. The length of a pattern p, denoted by |p|, is just the number of its symbols. The set of all patterns of length n is denoted by Pn, the set of all patterns at all by P*. A substitution is a mapping f: P* → P* such that
- f is a homomorphism with respect to string concatenation (⋅), formally: ∀p,q∈P*. f(p⋅q) = f(p)⋅f(q);
- f is non-erasing, formally: ∀p∈P*. f(p) ≠ ε, where ε denotes the empty string; and
- f respects constants, formally: ∀s∈Σ. f(s) = s.
If p = f(q) for some patterns p, q ∈ P* and some substitution f, then p is said to be less general than q, written p≤q; in that case, necessarily |p| ≥ |q| holds. For a pattern p, its language is defined as the set of all less general patterns that are built from constants only, formally: L(p) = { s ∈ Σ+ : s ≤ p }, where Σ+ denotes the set of all finite non-empty string of symbols from Σ.
For example, using the constants Σ = { 0, 1 } and the variables X = { x, y, z, ... }, the pattern 0x10xx1 and xxy has length 7 and 3, respectively. An instance of the former pattern is 00z100z0z1 and 01z101z1z1, it is obtained by the substitution that maps x to 0z and to 1z, respectively, and each other symbol to itself. Both 00z100z0z1 and 01z101z1z1 are also instances of xxy. The language of the pattern x0 is the set of all bit strings which denote an even number. The language of xx is the set of all strings obtainable by concatenating a bit string with itself, e.g. 00, 11, 0101, 1010, 11101110 ∈ L(xx).
Properties
Learning patterns
References
- ^ Dana Angluin (1980). "Finding Patterns Common to a Set of Strings" (PDF). Journal of Computer and System Sciences. 21: 46–62.