LCP array
In computer science, the longest common prefix array (LCP array) is an auxiliary data structure to the suffix array, which stores the length of the longest common prefix between every pair of lexicographically consecutive suffixes in the suffix array.
Augmenting the suffix array with the LCP array allows to efficiently simulate top-down and bottom-up traversals of the suffix tree (Kasai et al. 2001)(Abouelhoda, Kurtz, and Ohlebusch 2004), speeds up pattern matching on the suffix array (Manber and Myers 1993) and is a prerequisite for compressed suffix trees(Ohlebusch, Fischer, and Gog 2010).
History
The LCP array was introduced by Manber and Gene Myers alongside the suffix array in order to improve the running time of their string search algorithm.(Manber and Myers 1993)
Definition
Let SA be the suffix array of the text T= and let lcp(v,w) denote the length of the longest common prefix between two strings v and w. Let further denote T[i,j] the substring of T ranging from i to j.
Then the LCP array H[1,n] is an integer array of size n such that H[1] is undefined and H[i]=lcp(T[SA[i-1],n],T[SA[i],n]) for every . Thus H[i] stores the length of longest common prefix of the lexicographic (i-1)'st and ith smallest suffix of T.
Example
Efficient Construction Algorithms
LCP array construction algorithms can be divided into two different categories:
- algorithms that compute the LCP array as a byproduct to the suffix array (Manber and Myers 1993)(Kärkkäinen and Sanders 2003)(Fischer 2011)
- algorithms that compute the LCP array from the already constructed suffix array (Kasai et al. 2001)(Manzini 2004)(Kärkkäinen, Manzini, and Puglisi 2009)(Puglisi and Turpin 2008)(Gog and Ohlebusch 2011)
Currently, the fastest linear-time LCP array construction algorithm is due to (Fischer 2011), which in turn is based on one of the fastest suffix array construction algorithms (Nong, Zhang, and Chan 2009)