Jump to content

Longest palindromic substring

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by 106.215.17.78 (talk) at 21:50, 19 October 2021 (Slow algorithm). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computer science, the longest palindromic substring or longest symmetric factor problem is the problem of finding a maximum-length contiguous substring of a given string that is also a palindrome. For example, the longest palindromic substring of "bananas" is "anana". The longest palindromic substring is not guaranteed to be unique; for example, in the string "abracadabra", there is no palindromic substring with length greater than three, but there are two palindromic substrings with length three, namely, "aca" and "ada". In some applications it may be necessary to return all maximal palindromic substrings (that is, all substrings that are themselves palindromes and cannot be extended to larger palindromic substrings) rather than returning only one substring or returning the maximum length of a palindromic substring.

Manacher (1975) invented a linear time algorithm for listing all the palindromes that appear at the start of a given string. However, as observed e.g., by Apostolico, Breslauer & Galil (1995), the same algorithm can also be used to find all maximal palindromic substrings anywhere within the input string, again in linear time. Therefore, it provides a linear time solution to the longest palindromic substring problem. Alternative linear time solutions were provided by Jeuring (1994), and by Gusfield (1997), who described a solution based on suffix trees. Efficient parallel algorithms are also known for the problem.[1]

The longest palindromic substring problem should not be confused with the different problem of finding the longest palindromic subsequence.

// yes

Manacher's algorithm

Below is the pseudocode for Manacher's algorithm. The algorithm is faster than the previous algorithm because it exploits when a palindrome happens inside another palindrome.

For example, consider the input string "abacaba". By the time it gets to the "c", Manacher's algorithm will have identified the length of every palindrome centered on the letters before the "c". At the "c", it runs a loop to identify the largest palindrome centered on the "c": "abacaba". With that knowledge, everything after the "c" looks like the reflection of everything before the "c". The "a" after the "c" has the same longest palindrome as the "a" before the "c". Similarly, the "b" after the "c" has a longest palindrome that is at least the length of the longest palindrome centered on the "b" before the "c". There are some special cases to consider, but that trick speeds up the computation dramatically.

    Longest_Palindrome(string S) {
        string S' = S with a bogus character (eg. '|') inserted between each character (including outer boundaries)
        array PalindromeRadii = [0,...,0] // The radius of the longest palindrome centered on each place in S'
        // note: length(S') = length(PalindromeRadii) = 2 × length(S) + 1
        
        Center = 0
        Radius = 0
        
        while Center < length(S') {
            
                else { // PalindromeRadii[MirroredCenter] = MaxMirroredRadius
                    Radius = MaxMirroredRadius
                    break  // exit while loop early
                }   
            }      
        }
        
        longest_palindrome_in_S' = 2*max(PalindromeRadii)+1
        longest_palindrome_in_S = (longest_palindrome_in_S'-1)/2
        return longest_palindrome_in_S 
    }

Special Cases

Manacher's algorithm is faster because it reuses precomputed data when a palindrome exists inside another palindrome. There are 3 cases of this. They are represented by the "if / else if / else" statement in the pseudocode.

The first case is when the palindrome at MirroredCenter lies completely inside the "Old" palindrome. In this situation, the palindrome at Center will have the same length as the one at MirroredCenter. For example, if the "Old" palindrome is "abcbpbcba", we can see that the palindrome centered on "c" after the "p" must have the same length as the palindrome centered on the "c" before the "p".

The second case is when the palindrome at MirroredCenter extends outside the "Old" palindrome. That is, it extends "to the left" (or, contains characters with a lower index than any inside the "Old" palindrome). Because the "Old" palindrome is the largest possible palindrome centered on OldCenter, we know the characters before and after it are different. Thus, the palindrome at Center will run exactly up to the border of the "Old" palindrome, because the next character will be different than the one inside the palindrome at MirroredCenter. For example, if the string was "ababc", the "Old" palindrome could be "bab" with the Center being the second "b" and the MirroredCenter being the first "b". Since the palindrome at the MirroredCenter is "aba" and extends beyond the boundaries of the "Old" palindrome, we know the longest palindrome at the second "b" can only extend up to the border of the "Old" palindrome. We know this because if the character after the "Old" palindrome had been an "a" instead of a "c", the "Old" palindrome would have been longer.

The third and last case is when the palindrome at MirroredCenter extends exactly up to the border of the "Old" palindrome. In this case, we don't know if the character after the "Old" palindrome might make the palindrome at Center longer than the one at MirroredCenter. But we do know that the palindrome at Center is at least as long as the one at MirroredCenter. In this case, Radius is initialized to the radius of the palindrome at MirroredCenter and the search starts from there. An example string would be "abcbpbcbp" where the "Old" palindrome is "bcbpbcb" and the Center is on the second "c". The MirroredCenter is the first "c" and it has a longest palindrome of "bcb". The longest palindrome at the Center on the second "c" has to be at least that long and, in this case, is longer.

Runtime

The algorithm runs in linear time. This can be seen by putting bounds on how many iterations are run of each loop. The outer loop and second inner loop increment Center by for every iteration. Since Center is bounded by the length of the string, we know these loops run times. The first inner loop increments Radius by for every iteration and the second inner loop, when it stops, decrements Radius by at most for every iteration. Since the second inner loop can run at most times and the value for Radius cannot exceed the first inner loop can run at most times. The overall runtime is

Notes

References

  • Apostolico, Alberto; Breslauer, Dany; Galil, Zvi (1995), "Parallel detection of all palindromes in a string", Theoretical Computer Science, 141 (1–2): 163–173, doi:10.1016/0304-3975(94)00083-U.
  • Crochemore, Maxime; Rytter, Wojciech (1991), "Usefulness of the Karp–Miller–Rosenberg algorithm in parallel computations on strings and arrays", Theoretical Computer Science, 88 (1): 59–82, doi:10.1016/0304-3975(91)90073-B, MR 1130372.
  • Crochemore, Maxime; Rytter, Wojciech (2003), "8.1 Searching for symmetric words", Jewels of Stringology: Text Algorithms, World Scientific, pp. 111–114, ISBN 978-981-02-4897-0.
  • Gusfield, Dan (1997), "9.2 Finding all maximal palindromes in linear time", Algorithms on Strings, Trees, and Sequences, Cambridge: Cambridge University Press, pp. 197–199, doi:10.1017/CBO9780511574931, ISBN 0-521-58519-8, MR 1460730.
  • Jeuring, Johan (1994), "The derivation of on-line algorithms, with an application to finding palindromes", Algorithmica, 11 (2): 146–184, doi:10.1007/BF01182773, hdl:1874/20926, MR 1272521, S2CID 7032332.
  • Manacher, Glenn (1975), "A new linear-time "on-line" algorithm for finding the smallest initial palindrome of a string", Journal of the ACM, 22 (3): 346–351, doi:10.1145/321892.321896, S2CID 10615419.
This article incorporates text from Longest palindromic substring on PEGWiki under a Creative Commons Attribution (CC-BY-3.0) license.