Pages

Monday, August 4, 2014

Manacher's algorithm: longest palindromic substring problem

In this post I'll try to explain Manacher's algorithm in a simple way. Manacher's algorithm is a solution for finding the longest palindromic substring of a given string.

A palindrome is a sequence of symbols or elements that reads the same forward or reversed.
Examples of a few palindromes:

radar    level    rotor    noon
abba    aba    b    xxx1y1xxx

Manacher's algorithm is a really good/fast solution for this problem since its time and space complexity is O(N).

Pre-process string S

The first step of the algorithm is to pre-process a given string S, into string T, by inserting a special character between every character of S and also at both ends. For the algorithm future convenience, a distinct special character should then be placed at both ends. This pre-processing is done just to handle both even and odd sized S strings gracefully.

If say, '#' is used as the main special character, these are some examples of the original strings and their resulting transformations:
S: abba     T: ^#a#b#b#a#$
S: a        T: ^#a#$
Now would be a good time to code this first step. Try to code it for yourself before seeing the snippet below, it should be pretty straight forward.
When you are done, compare your solution with the pseudo code below.
preprocessManacherAlgorithm(s) {
    if (s.length() == 0)
        return "^$";

    t = "^";
    for (i = 0; i < s.length(); i++)
        t += "#" + s[i];
    t += "#$";

    return t;
}

Process string T

After having pre-processed a given string S into string T, the next step is to process string T itself. To do that, an auxiliary array P with size equal to the length of T is required.

The content of the i-th element of P corresponds to the size of the maximum expansion to both sides of the i-th element of T in order to form a palindrome.

Was that too confusing? Look at the example below to better comprehend the purpose of P.



What is the longest palindromic substring contained in T? The answer is # a # b # a #, which has a length of 7 characters, right? Also notice that the character 'b', i = 4, is the center of the palindrome.

So, now the question is: how many characters of the palindrome are to the right or to the left of the palindrome's center? Or in other words: what is the size of the expansion to both left or right of the center character which makes up the palindrome?
The answer is 3. The image below illustrates this.



The goal is to populate P, and that task should now be easy.
The result is as follows:



Important note:
It is a good idea to have an auxiliary variable maxPalCenterID with the index of the highest value in P and update it while we populate P.

After having populated P, the longest palindromic substring is very easy to fetch from the original string S.

Since there is a special character at the start of T due to the pre-processing, and also another special character between every character of S, the palindrome start index is given by the formula below.
palStartID = (maxPalCenterID - 1 - P[maxPalCenterID]) / 2;
At this time, the longest palindromic substring has been found successfully, and though this is already a pretty good solution, it does not have a time and space complexity of O(N). In order to reach that complexity, there is a trick which Manacher's algorithm makes use of.

You should try to code by yourself the algorithm so far. Do not peek the snippet below. It is important that you try for yourself and only after compare your solution with the pseudo code below.
almostManacherAlgorithm(s) {
    t = preprocessManacherAlgorithm(s);

    n = t.size();
    P[n, 0]; // array of size n filled with 0

    maxPalCenterID = 1;
    for (i = 1; i < n - 1; i++) {
        while (t[i + 1 + P[i]] == t[i - 1 - P[i]])
            P[i]++;

        if (P[i] > P[maxPalCenterID])
            maxPalCenterID = i;
    }

    maxPalSize = P[maxPalCenterID];
    palStartID = (maxPalCenterID - 1 - maxPalSize) / 2;

    return s.substr(palStartID, maxPalSize);
}

Make it O(N)

Now that we have a functional algorithm, can it run faster? If yes, what can be done to make it run faster?

The trick resides in the symmetrical property of palindromes. Consider the following example, where C is the center of a palindrome and L and R are the left and right bounds, respectively.



Now populate P until R, the right bound of the palindrome with center in C.

Notice the symmetry? If there was a mirror at C, the content of P between L and R would be completely symmetric, just like it actually is! Check it for yourself below.



And now you might be thinking: we can save a lot of computations just by copying the half of a palindrome to the left of the center, mirror it, and pasting to the right! And then keep populating P after R with the same principle!

Go ahead and try that for yourself. Populate P based on this concept for C = 8. You should get something like this, which is wrong if you look carefully:



The right answer would be:



Well, that is still a great conclusion! Although it is not completely correct, you are very close to finding the trick!

As you may already be thinking, this concept works, but it fails at certain point. What is the limit?

If you examine a couple of more examples, this limit will be very easy to detect:
When P[i'] is less or equal to R-i, then P[i] is always equal to the minimum of these values: R-i or P[i']; Otherwise, the only valid information we have is that P[i'] is greater or equal to P[i], which is already a very valuable information! If this is the case, we just have to try and expand this palindrome to find P[i].

The final part that completes the trick is answered by the following question: when should C be updated? Again, if a few examples are examined by hand, it is not hard to find that:

Whenever a palindrome centered at i expands past the right bound of the palindrome centered at C (in other words, if it expands past R), C is assigned the value of i and R is updated according to the content of P[i] and i itself. Does that make sense? I really hope so! :)

And that is it! This concludes the beautiful Manacher's algorithm: a tool for finding the longest palindromic substring in O(N).

Now you just have to add these little extras to the previous code. Do not peek the snippet below. Once again, it is important that you try for yourself. Then you may compare your solution with the pseudo code below.
manacherAlgorithm(s) {
    t = preprocessManacherAlgorithm(s);

    n = t.size();
    P[n, 0]; // array of size n filled with 0
    C = R = 0;

    maxPalCenterID = 1;
    for (i = 1; i < n - 1; i++) {
        // i' = C - (i-C)
        ii = 2 * C - i;

        // save several computations
        P[i] = (R > i) ? min(R - i, P[ii]) : 0;

        // expand palindrome
        while (t[i + 1 + P[i]] == t[i - 1 - P[i]])
            P[i]++;

        // update index of the center of the biggest palindrome
        if (P[i] > P[maxPalCenterID])
            maxPalCenterID = i;

        // adjust center if palindrome
        // centered at i expands past R
        if (i + P[i] > R) {
            C = i;
            R = i + P[i];
        }
    }

    maxPalSize = P[maxPalCenterID];
    palStartID = (maxPalCenterID - 1 - maxPalSize) / 2;

    return s.substr(palStartID, maxPalSize);
}
You might be interested in checking this tutorial if you did not understand my explanation. I learnt this algorithm from there.

3 comments: