Jump to content

Substring

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Jochen Burghardt (talk | contribs) at 08:03, 8 October 2019 (Substring: fix typo; suggest to explain new definition along the example). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
"string" is a substring of "substring"

A substring is a contiguous sequence of characters within a string. For instance, "the best of" is a substring of "It was the best of times". This is not to be confused with subsequence, which is a generalization of substring. For example, "Itwastimes" is a subsequence of "It was the best of times", but not a substring.

Prefix and suffix are special cases of substring. A prefix of a string is a substring of that occurs at the beginning of . A suffix of a string is a substring that occurs at the end of .

The list of all substrings of the string "apple" would be "apple", "appl", "pple", "app", "ppl", "ple", "ap", "pp", "pl", "le", "a", "p", "l", "e", "".

Substring

A string is a substring (or factor)[1] of a string if there exists two strings and such that . In particular, the empty string is a substring of every string. A substring of a string is a prefix of a suffix of the string, and equivalently a suffix of a prefix. If is a substring of , it is also a subsequence, which is a more general concept. Given a pattern, you can find its occurrences in a string with a string searching algorithm. Finding the longest string which is equal to a substring of two or more strings is known as the longest common substring problem.

Example: The string ana is equal to substrings (and subsequences) of banana at two different offsets:

banana
 |||||
 ana||
   |||
   ana

The first occurrence is obtained with b and ne, while the second occurence is obtained with ban and being the empty string.

In the mathematical literature, substrings are also called subwords (in America) or factors (in Europe).

Prefix

A string is a prefix[1] of a string if there exists a string such that . A proper prefix of a string is not equal to the string itself;[2] some sources[3] in addition restrict a proper prefix to be non-empty. A prefix can be seen as a special case of a substring.

Example: The string ban is equal to a prefix (and substring and subsequence) of the string banana:

banana
|||
ban

The square subset symbol is sometimes used to indicate a prefix, so that denotes that is a prefix of . This defines a binary relation on strings, called the prefix relation, which is a particular kind of prefix order.

Suffix

A string is a suffix[1] of a string if there exists a string such that . A proper suffix of a string is not equal to the string itself. A more restricted interpretation is that it is also not empty[1]. A suffix can be seen as a special case of a substring.

Example: The string nana is equal to a suffix (and substring and subsequence) of the string banana:

banana
  ||||
  nana

A suffix tree for a string is a trie data structure that represents all of its suffixes. Suffix trees have large numbers of applications in string algorithms. The suffix array is a simplified version of this data structure that lists the start positions of the suffixes in alphabetically sorted order; it has many of the same applications.

Border

A border is suffix and prefix of the same string, e.g. "bab" is a border of "babab" (and also of "babooneatingakebab").

Superstring

A superstring of a finite set of strings is a single string that contains every string in as a substring. For example, is a superstring of , and is a shorter one. Generally, one is interested in finding superstrings whose length is as small as possible;[clarification needed] a concatenation of all strings of in any order gives a trivial superstring of .

See also

References

  1. ^ a b c Lothaire, M. (1997). Combinatorics on words. Cambridge: Cambridge University Press. ISBN 0-521-59924-5.
  2. ^ Kelley, Dean (1995). Automata and Formal Languages: An Introduction. London: Prentice-Hall International. ISBN 0-13-497777-7.
  3. ^ Gusfield, Dan (1999) [1997]. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. USA: Cambridge University Press. ISBN 0-521-58519-8.
  • Media related to Substring at Wikimedia Commons