Pattern language (formal languages)
In theoretical computer science, a pattern language is a formal language that can be defined as the set of all particular instances of a string of constants and variables. Pattern Languages were introduced by Dana Angluin in the context of machine learning. [1]
Definition
Given a finite set Σ of constant symbols and a countable set X of variable symbols disjoint from Σ, a pattern is a finite non-empty string of symbols from Σ∪X. The length of a pattern p, denoted by |p|, is just the number of its symbols. The set of all patterns of length n is denoted by Pn, the set of all patterns at all by P*. A substitution is a mapping f: P* → P* such that
- f is a homomorphism with respect to string concatenation (⋅), formally: ∀p,q∈P*. f(p⋅q) = f(p)⋅f(q);
- f is non-erasing, formally: ∀p∈P*. f(p) ≠ ε, where ε denotes the empty string; and
- f respects constants, formally: ∀s∈Σ. f(s) = s.
If p = f(q) for some patterns p, q ∈ P* and some substitution f, then p is said to be less general than q, written p≤q; in that case, necessarily |p| ≥ |q| holds. For a pattern p, its language is defined as the set of all less general patterns that are built from constants only, formally: L(p) = { s ∈ Σ+ : s ≤ p }, where Σ+ denotes the set of all finite non-empty string of symbols from Σ.
For example, using the constants Σ = { 0, 1 } and the variables X = { x, y, z, ... }, the pattern 0x10xx1 and xxy has length 7 and 3, respectively. An instance of the former pattern is 00z100z0z1 and 01z101z1z1, it is obtained by the substitution that maps x to 0z and to 1z, respectively, and each other symbol to itself. The language of the pattern xx is the set of all strings obtainable by concatenating a bit string with itself, e.g. 00, 11, 0101, 1010, 11101110 ∈ L(xx).
Properties
Learning patterns
References
- ^ Dana Angluin (1980). "Finding Patterns Common to a Set of Strings" (PDF). Journal of Computer and System Sciences. 21: 46–62.