Jump to content

Complexity function

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Deltahedron (talk | contribs) at 21:22, 14 November 2012 (disjunctive sequence, cite Bugeaud (2012)). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computer science, the complexity function of a string, a finite or infinite sequence u of letters from some alphabet, is the function pu(n) of a positive integer n that counts the number of different factors (consecutive substrings) of length n from the string u.[1][2][3][4]

For a string u of length at least n over an alphabet of size k we clearly have

the bounds being achieved by the constant word and a disjunctive word,[5] for example, the Champernowne word respectively.[6] For infinite words u, we have pu(n) bounded if u is ultimately periodic (a finite, possibly empty, sequence followed by a finite cycle). Conversely, if pu(n) ≤ n for some n, then u is ultimately periodic.[3]

An aperiodic sequence is one which is not ultimately periodic. An aperiodic sequence has strictly increasing complexity function (this is the Morse–Hedlund theorem),[7] so p(n) is at least n+1.[8]

A set S of finite binary words is balanced if for each n the subset Sn of words of length n has the property that the Hamming weight of the words in Sn takes at most two distinct values. A balanced sequence is one for which the set of factors is balanced. A balanced sequence has complexity sequence at most n+1.[9]

A Sturmian word over a binary alphabet is one with complexity function n + 1.[10] A sequence is Sturmian if and only if it is balanced and aperiodic.[2] An example is the Fibonacci word.[10][11] More generally, a Sturmian word over an alphaber of size k is one with complexity n+k−1. An Arnoux-Rauzy word over a ternary alphabet has complexity 2n + 1:[10] an example is the Tribonacci word.[12]

For recurrent words, those in which each factor appears infinitely often, the complexity function almosr characterises the set of factors: if s is a recurrent word with the same complexity function as t are then s has the same set of factors as t or δt where δ denotes the letter doubling morphism aaa.[13]

The topological entropy of an infinite sequence u is defined by

The limit exists as the logarithm of the complexity function is subadditive.[14] Every real number between 0 and 1 occurs as the topological entry of some sequence.[15]

References

  1. ^ Lothaire (2011) p.7
  2. ^ a b Lothaire (2011) p.46
  3. ^ a b Pytheas Fogg (2002) p.3
  4. ^ Berstel et al (2009) p.82
  5. ^ Bugeaud (2012) p.91
  6. ^ Cassaigne & Nicolas (2010) p.165
  7. ^ Cassaigne & Nicolas (2010) p.166
  8. ^ Lothaire (2011) p.22
  9. ^ Lothaire (2011) p.48
  10. ^ a b c Pytheas Fogg (2002) p.6
  11. ^ de Luca, Aldo (1995). "A division property of the Fibonacci word". Information Processing Letters. 54 (6): 307–312. doi:10.1016/0020-0190(95)00067-M.
  12. ^ Pytheas Fogg (2002) p.368
  13. ^ Berstel et al (2009) p.84
  14. ^ Pytheas Fogg (2002) p.4
  15. ^ Cassaigne & Nicolas (2010) p.169