Menu Home

String Problems: KMP & Manacher Algorithms

KMP & Manacher’s Algorithm

KMP Algorithm

Introduction to KMP Algorithm

KMP algorithm is invented by D.E.Kruth, J.H.Morris and V.R.Pratt. It accelerates the string pattern-matching process to $\mathcal{O}(n + m)$, where $n$ is the length of the matched string and $m$ is the length of the matching string, comparing to a naive running time of $\mathcal{O}(nm)$.

Implementation

The way that KMP algorithm works is that it moves the pattern/matching string when a mismatch in the original string occurs. The following image explains how the process works(Image comes from internet).

img

Instead of go to the first character of the pattern string, the KMP algorithms moves the pattern string to $fail[j]$ where $j$ is the index of the pattern string when the mismatch occurs. For example, when the first mismatch occurs, $fail[2] = 0$. Now, the problem is transformed into how to get the fail array. The following code provides the way we can get fail array.

int fail[maxn];
void build(string s){
    int i = 0,j = -1;                   
    fail[0] = j;
    while(i < (int) s.size()){
        if(j == -1 || s[i] == s[j]){         // If current character is equal to previous one
            fail[++i] = ++j;                 // we set current fail equals to j + 1.
        }
        else j = fail[j];                    // If mismatch occurs, we jump in fail array.
    }
}

This code may seem to be confusing. But the main idea is: If a suffix of a string equals to the prefix of the same string, then if mismatch occurs at the suffix, we can jump to the prefix since the parts that are indentical are skipped. With this idea, we can write the code for matching a pattern string in a template string.

int fail[maxn];
int KMP(string s1,string s2){               // Finds the first occurence of the string
    int i = 0,j = 0;
    while(i < (int) s1.size() && j < (int) s2.size()){
        if(j == -1 || s[i] == s[j]){
            i++,j++;                        // If current two equals we move to the next
        }
        else j = fail[j];                   // If mismatch occurs at current suffix, we then
    }                                       // jump to the longest prefix that is identical.
    if(j == s2.size()) return i - j;        // Find the string, return the position.
    else return -1;                         // Cannot find the string.
}

Problems for KMP Algorithm

BZOJ 1355

Problem Description: Given a string $s$, it is a substring of a string that can be expressed in form as $s_1s_1s_1\cdots s_1$. Find the minimum length of $s_1$.

Input Format: The input is comprised of two lines. The first line is the length of the string, the second line is the string, which only contains lower-case letters.

Output Format: One line with integer represents the minimum length of $s_1$.

Sample Input:

8
cabcabca

Sample Output:

3

Solution: This problem requires to understand how KMP works. We can eventually conclude that the minimum length of the cyclic repetend is $i – fail[i]$ for every prefix of a string.

Code:

#pragma comment(linker, "/stack:200000000")
#pragma GCC optimize("Ofast")
#pragma GCC target("sse,sse2,sse3,ssse3,sse4")
#pragma GCC optimize("unroll-loops")
#include <iostream>
#include <cstring>
#include <string>
#include <vector>
#include <queue>
#include <map>
#include <set>
#include <stdio.h>
#include <fstream>
#include <cmath>
#include <stdlib.h>
#include <iomanip>
#include <algorithm>
#include <limits.h>
#include <stack>
#include <cmath>
#include <math.h>
#include <array>
#include <bitset>
#include <functional>
using namespace std;
#define FAST_IO ios::sync_with_stdio(false);
#define DEBUG_VIS(x) cerr<<"vis "<<x<<endl;
#define DEBUG_REACH cerr<<"reach here"<<endl;
#define DEBUG_SEQ(x) cerr<<x<<" ";
#define SET_ZERO(x) memset(x,0,sizeof(x));
#define SET_NEGONE(x) memset(x,-1,sizeof(x));
#define SET_INF(x) memset(x,127,sizeof(x));
#define SET_NEGINF(x) memset(x,128,sizeof(x));
#define IS_INF(x) x < 2100000000
typedef pair<int,int> pii;
typedef pair<long long,long long> pll;
typedef pair<double,double> pdd;
typedef long long ll;
typedef unsigned int uint;
typedef unsigned long long ull;
const int maxn = 1e6 + 10;
template <class T> T mymin(const T a,const T b){return a < b ? a : b;}
template <class T> T mymax(const T a,const T b){return a > b ? a : b;}
int N;
string s;
int fail[maxn];

void build(string str){
    int i = 0,j = -1;
    fail[0] = j;
    while(i < (int) str.size()){
        if(j == -1 || str[i] == str[j]){
            fail[++i] = ++j;
        }
        else j = fail[j];
    }
}

int main(){
    #ifndef ONLINE_JUDGE
        freopen("data.in","r",stdin);
    #endif
    FAST_IO
    cin>>N>>s;
    build(s);
    cout<<s.size() - fail[s.size()]<<endl;
    return 0;
}

NOI2014: The Zoo

Problem Description: Given a string $s$, for every prefix $s_{p_i}$of the string, find the number $num_i$ of non-overlapping prefix and suffix of $s_{p_i}$ such that they are identical. Then, output the value of $\Pi_{i = 0}^{n – 1}(num_i + 1)\space mod\space 1000000007$, where $n$ is the length of the string and $n \leq 10^6$.

Input Format: One line with integer T $\leq5$, represents the number of test cases. Following T lines, each contains a string.

Output Format: T lines, each with one number represents the anwser of the corresponding test case.

Sample Input:

3
aaaaa
ab
abcababc

Sample Output:

36
1
32

Solution: First we need to prove that the number of times we need to jump on fail array from index $i$ to $0$ is the number of prefixes and suffixes that are equal without consideration of overlapping in current prefix $s_{p_i}$. Assume the longest suffix calculated by KMP of a prefix of the original string is $suf_i$ and a prefix with similar property is $pre_i$. Assume we jump on $pre_i$ again, we know that $pre_{pre_i} = suf_{pre_i}$. Since we have $pre_i = suf_i$, so we have $pre_{pre_i} = suf_{pre_i} = pre_{suf_i} = suf_{suf_i}\rightarrow pre_{pre_i} = suf_{suf_i}$. Therefore, we can see that $pre_{pre_i}$ is just another prefix that has similar property as those prefix calculated by KMP except the fact that it is not the longest. However, it might be the ones we want to calculate. Then, we only have to judge if the the prefix and the suffix will overlap. Since they are identical, they are also equal in length. So if $2 * len(pre_i) > i$, then it is obvious that they will overlap, and the problem is solved.

Code:

#pragma comment(linker, "/stack:200000000")
#pragma GCC optimize("Ofast")
#pragma GCC target("sse,sse2,sse3,ssse3,sse4")
#pragma GCC optimize("unroll-loops")
#include <iostream>
#include <cstring>
#include <string>
#include <vector>
#include <queue>
#include <map>
#include <set>
#include <stdio.h>
#include <fstream>
#include <cmath>
#include <stdlib.h>
#include <iomanip>
#include <algorithm>
#include <limits.h>
#include <stack>
#include <cmath>
#include <math.h>
#include <array>
#include <bitset>
#include <functional>
using namespace std;
#define FAST_IO ios::sync_with_stdio(false);
#define DEBUG_VIS(x) cerr<<"vis "<<x<<endl;
#define DEBUG_REACH cerr<<"reach here"<<endl;
#define DEBUG_SEQ(x) cerr<<x<<" ";
#define SET_ZERO(x) memset(x,0,sizeof(x));
#define SET_NEGONE(x) memset(x,-1,sizeof(x));
#define SET_INF(x) memset(x,127,sizeof(x));
#define SET_NEGINF(x) memset(x,128,sizeof(x));
#define IS_INF(x) x < 2100000000
typedef pair<int,int> pii;
typedef pair<long long,long long> pll;
typedef pair<double,double> pdd;
typedef long long ll;
typedef unsigned int uint;
typedef unsigned long long ull;
const int maxn = 1e6 + 10;
template <class T> T mymin(const T a,const T b){return a < b ? a : b;}
template <class T> T mymax(const T a,const T b){return a > b ? a : b;}
string s;
int fail[maxn];
ll num[maxn];
ll mod = 1e9 + 7;

void build(string s){
    int i = 0,j = -1;
    fail[0] = -1;
    while(i < (int) s.size()){
        if(j == -1 || s[i] == s[j]){
            fail[++i] = ++j;
            num[i] = num[j] + 1;
        }
        else j = fail[j];
    }
}

int main(){
    #ifndef ONLINE_JUDGE
        freopen("data.in","r",stdin);
    #endif
    FAST_IO
    int T;
    cin>>T;
    while(T--){
        SET_ZERO(fail)
        cin>>s;
        num[0] = 0,num[1] = 1;
        build(s);
        long long ans = 1;
        int i = 0,j = 0;
        while(i < (int) s.size()){
            if(j == -1 || s[i] == s[j]){
                i++,j++;
                while(j * 2 > i) j = fail[j];
                ans = ans * (num[j] + 1) % mod;
            }
            else j = fail[j];
        }
        cout<<ans<<endl;
    }
    return 0;
}

Manacher’s Algorithm

Introduction to Manacher’s Algorithm

It is used to solve problems such as longest palindrome in a string in $\mathcal{O}(n)$ where $n$ is the length of the string.

Implementation

Since it is not a super hard algorithm, and there are some pretty good explanations online, I will just post a code here.

Manacher’s Algorithm:

string s;
char str[maxn<<1];
int tot,p[maxn<<1];
void build(){
    str[0] = '


Problem for Manacher's Algorithm

BZOJ2160

Problem Description: Given a string $s$, find the all substrings of $s$ such that they are palindromes and the length of each is odd. Sort the length in decreasing order and output the product of first $K$ odd palindromes $mod \space19930726$. If there are less than $K$ odd palindroms in total, then output $-1$. Input Format: First line with two number $N \leq 10^6$ and $K\leq10^9$, where $N$ is the length of the string in the second line. Output Format: Print the answer. Sample Input:
5 3

ababa
Sample Output:
45
Solution: This is an intriguing problem(~~means finished by myself. Finally..~~). It is not so hard after all. After calculating all odd palindromes, just use a binary index tree to count the number of palindromes for each length. And calculate the product by using a fast power trick. Code:
#pragma comment(linker, "/stack:200000000")
#pragma GCC optimize("Ofast")
#pragma GCC target("sse,sse2,sse3,ssse3,sse4")
#pragma GCC optimize("unroll-loops")
#include <iostream>
#include <cstring>
#include <string>
#include <vector>
#include <queue>
#include <map>
#include <set>
#include <stdio.h>
#include <fstream>
#include <cmath>
#include <stdlib.h>
#include <iomanip>
#include <algorithm>
#include <limits.h>
#include <stack>
#include <cmath>
#include <math.h>
#include <array>
#include <bitset>
#include <functional>
using namespace std;
#define FAST_IO ios::sync_with_stdio(false);
#define DEBUG_VIS(x) cerr<<"vis "<<x<<endl;
#define DEBUG_REACH cerr<<"reach here"<<endl;
#define DEBUG_SEQ(x) cerr<<x<<" ";
#define SET_ZERO(x) memset(x,0,sizeof(x));
#define SET_NEGONE(x) memset(x,-1,sizeof(x));
#define SET_INF(x) memset(x,127,sizeof(x));
#define SET_NEGINF(x) memset(x,128,sizeof(x));
#define IS_INF(x) x < 2100000000
typedef pair<int,int> pii;
typedef pair<long long,long long> pll;
typedef pair<double,double> pdd;
typedef long long ll;
typedef unsigned int uint;
typedef unsigned long long ull;
const int maxn = 1e6 + 10;
template <class T> T mymin(const T a,const T b){return a < b ? a : b;}
template <class T> T mymax(const T a,const T b){return a > b ? a : b;}
ll N,K,tot,p[maxn << 1];
string s;
char str[maxn<<1];
ll BIT[maxn << 1];
void add(int x,int delta){
    while(x <= N){
        BIT[x] += delta;
        x += x & -x;
    }
}
ll query(int x){
    ll res = 0;
    while(x){
        res += BIT[x];
        x -= x & -x;
    }
    return res;
}
void manacher(){
    ll id = 1,mx = 0;
    for(int i = 1; i <= tot; i++){
        p[i] = mx > i ? min(p[2 * id - i],mx - i) : 1;
        while(str[i + p[i]] == str[i - p[i]]) ++p[i];
        if(mx < i + p[i]){
            mx = i + p[i];
            id = i;
        }
    }
}
ll mod = 19930726;
ll fast_pow(int val,int tim){
    ll base = val,ans = 1;
    while(tim){
        if(tim & 1) ans = ans * base % mod;
        base = base * base % mod;
        tim >>= 1;
    }
    return ans % mod;
}
int main(){
    #ifndef ONLINE_JUDGE
        freopen("data.in","r",stdin);
    #endif
    FAST_IO
    cin>>N>>K;
    cin>>s;
    for(int j = 0; j < s.size(); j++){
        str[++tot] = '#';
        str[++tot] = s[j];
    }
    str[++tot] = '#';
    str[0] = '


Conclusion

For KMP algorithm and also other string problems, the concept of prefix and suffix should be used frequently in order to solve the problem. There are also further research in the prefixes and suffixes, which has property similar to those calculated by KMP, that named those as borders. It will be included later in the post. Also, the use of fail array should also be understood fully in order to understand further complex string algorithms, such as AC automaton and suffix automatons, etc. For Manacher's algorithm, it is important to keep in mind that the proper use of $p[i]$, the maximum length of palindrome centered at $i$, is important. Further techniques such as palindrome-tree and palindrome automaton will be included later in the post as well. Wish the dream will come true...... ; for(int i = 0; i < s.size(); i++){ str[++tot] = '#'; str[++tot] = s[i]; } str[++tot] = '#'; } void manacher(){ build(); int id = 1,mx = 0; for(int i = 1; i <= tot; i++){ p[i] = mx > i : min(mx - i,p[2 * id - 1]) : 1; while(str[i + p[i]] == str[i - p[i]]) p[i]++; if(mx < i + p[i]){ id = i; mx = i + p[i]; } } }

Problem for Manacher's Algorithm

BZOJ2160

Problem Description: Given a string $s$, find the all substrings of $s$ such that they are palindromes and the length of each is odd. Sort the length in decreasing order and output the product of first $K$ odd palindromes $mod \space19930726$. If there are less than $K$ odd palindroms in total, then output $-1$.

Input Format: First line with two number $N \leq 10^6$ and $K\leq10^9$, where $N$ is the length of the string in the second line.

Output Format: Print the answer.

Sample Input:


Sample Output:


Solution: This is an intriguing problem(~~means finished by myself. Finally..~~). It is not so hard after all. After calculating all odd palindromes, just use a binary index tree to count the number of palindromes for each length. And calculate the product by using a fast power trick.

Code:


Conclusion

For KMP algorithm and also other string problems, the concept of prefix and suffix should be used frequently in order to solve the problem. There are also further research in the prefixes and suffixes, which has property similar to those calculated by KMP, that named those as borders. It will be included later in the post. Also, the use of fail array should also be understood fully in order to understand further complex string algorithms, such as AC automaton and suffix automatons, etc.

For Manacher's algorithm, it is important to keep in mind that the proper use of $p[i]$, the maximum length of palindrome centered at $i$, is important. Further techniques such as palindrome-tree and palindrome automaton will be included later in the post as well.

Wish the dream will come true......
;
manacher();
for(int i = 2; i <= tot; i += 2){
int num = p[i] / 2;
add(1,1);
add(num + 1,-1);
}
ll ans = 1;
for(int i = (N + 1) / 2; i >= 1; i--){
ll num = query(i);
if(!num) continue;
if(K < num){
ans = ans * fast_pow(i * 2 - 1,K) % mod;
K = 0;
break;
}
else {
K -= num;
ans = ans * fast_pow(i * 2 - 1,num) % mod;
}
}
if(K) cout<<-1<<endl;
else cout<<ans<<endl;
return 0;
}

Conclusion

For KMP algorithm and also other string problems, the concept of prefix and suffix should be used frequently in order to solve the problem. There are also further research in the prefixes and suffixes, which has property similar to those calculated by KMP, that named those as borders. It will be included later in the post. Also, the use of fail array should also be understood fully in order to understand further complex string algorithms, such as AC automaton and suffix automatons, etc.

For Manacher's algorithm, it is important to keep in mind that the proper use of $p[i]$, the maximum length of palindrome centered at $i$, is important. Further techniques such as palindrome-tree and palindrome automaton will be included later in the post as well.

Wish the dream will come true......

;
for(int i = 0; i < s.size(); i++){
str[++tot] = '#';
str[++tot] = s[i];
}
str[++tot] = '#';
}
void manacher(){
build();
int id = 1,mx = 0;
for(int i = 1; i <= tot; i++){
p[i] = mx > i : min(mx - i,p[2 * id - 1]) : 1;
while(str[i + p[i]] == str[i - p[i]]) p[i]++;
if(mx < i + p[i]){
id = i;
mx = i + p[i];
}
}
}

Problem for Manacher's Algorithm

BZOJ2160

Problem Description: Given a string $s$, find the all substrings of $s$ such that they are palindromes and the length of each is odd. Sort the length in decreasing order and output the product of first $K$ odd palindromes $mod \space19930726$. If there are less than $K$ odd palindroms in total, then output $-1$.

Input Format: First line with two number $N \leq 10^6$ and $K\leq10^9$, where $N$ is the length of the string in the second line.

Output Format: Print the answer.

Sample Input:


Sample Output:


Solution: This is an intriguing problem(~~means finished by myself. Finally..~~). It is not so hard after all. After calculating all odd palindromes, just use a binary index tree to count the number of palindromes for each length. And calculate the product by using a fast power trick.

Code:


Conclusion

For KMP algorithm and also other string problems, the concept of prefix and suffix should be used frequently in order to solve the problem. There are also further research in the prefixes and suffixes, which has property similar to those calculated by KMP, that named those as borders. It will be included later in the post. Also, the use of fail array should also be understood fully in order to understand further complex string algorithms, such as AC automaton and suffix automatons, etc.

For Manacher's algorithm, it is important to keep in mind that the proper use of $p[i]$, the maximum length of palindrome centered at $i$, is important. Further techniques such as palindrome-tree and palindrome automaton will be included later in the post as well.

Wish the dream will come true......

Categories: OI

appledorem

Leave a Reply

Your email address will not be published. Required fields are marked *