Jump to content

Generalized suffix array

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by BryanRamos0199 (talk | contribs) at 21:09, 11 May 2021. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computer science, a generalized suffix array is a suffix array containing all suffixes for a set of strings. Given the set of strings of total length , it is a lexicographically sorted array of all suffixes of each string in .

Definition

The algorithm for constructing the generalized suffix array is as follows: [1]

  • Let be the set of strings for which to construct the suffix array for.
  • Let denote the sum of the lengths of all strings in and the length of the longest string in .
  • A straightforward sorting approach would concatenate all strings in S and utilize Manber & Myers (1990) algorithm to sort the concatenated string.
  • Let represent the suffix of that starts at position of the -string . is the initial position of suffix .
  • Let indicate the sorted array of suffixes of the strings in the set . So, indicates the starting position of the smallest suffix of the set of strings. The array P satisfies , where denotes lexicographical order.

This approach would cost time.

Application

A generalized suffix array can be used to compete the longest common subsequence of the strings in the set in linear time, whereas a naïve implementation would compute this information in quadratic time.

Notes

  1. ^ Shi, Fei (1996), "Suffix arrays for multiple strings: A method for on-line multiple string searches.", Lecture Notes in Computer Science, vol. 1179, Springer Berlin Heidelberg, doi:10.1007/BFb0027775, ISBN 978-3-540-62031-0

References