Generalized suffix array
Appearance
This article, Generalized suffix array, has recently been created via the Articles for creation process. Please check to see if the reviewer has accidentally left this template after accepting the draft and take appropriate action as necessary.
Reviewer tools: Inform author |
In computer science, a generalized suffix array is a suffix array containing all suffixes for a set of strings. Given the set of strings of total length , it is a lexicographically sorted array of all suffixes of each string in .
Definition
The algorithm for constructing the generalized suffix array is as follows: [1]
- Let be the set of strings for which to construct the suffix array for.
- Let denote the sum of the lengths of all strings in and the length of the longest string in .
- A straightforward sorting approach would concatenate all strings in S and utilize Manber & Myers (1990) algorithm to sort the concatenated string.
- Let represent the suffix of that starts at position of the -string ∈ . is the initial position of suffix .
- Let indicate the sorted array of suffixes of the strings in the set . So, indicates the starting position of the smallest suffix of the set of strings. The array P satisfies , where denotes lexicographical order.
This approach would cost time.
Application
A generalized suffix array can be used to compete the longest common subsequence of the strings in the set in linear time, whereas a naïve implementation would compute this information in quadratic time.
Notes
- ^ Shi, Fei (1996), "Suffix arrays for multiple strings: A method for on-line multiple string searches.", Lecture Notes in Computer Science, vol. 1179, Springer Berlin Heidelberg, doi:10.1007/BFb0027775, ISBN 978-3-540-62031-0
References
- Manber, Udi; Myers, Gene (1990). Suffix arrays: a new method for on-line string searches. First Annual ACM-SIAM Symposium on Discrete Algorithms. pp. 319–327.
- Shi, Fei (1996). "Suffix arrays for multiple strings: A method for on-line multiple string searches". Lecture Notes in Computer Science. 1179. Springer Berlin Heidelberg. doi:10.1007/BFb0027775. ISBN 978-3-540-62031-0.