Borders of Strings
A border of a string $s$ of length $n$ is a string that is simultaneously a proper prefix and a proper suffix of $s$. “Proper” means the border cannot be the entire string itself.
For example, consider $s = \texttt{abacaba}$.
| Border | Length |
|---|---|
| $\texttt{a}$ | 1 |
| $\texttt{aba}$ | 3 |
| $\texttt{abacaba}$ | 7 (not proper, so not a border) |
The longest border of $\texttt{abacaba}$ is $\texttt{aba}$ with length 3.
Another example: $s = \texttt{aabaaab}$.
| Prefix | Suffix | Match? |
|---|---|---|
| $\texttt{a}$ | $\texttt{b}$ | No |
| $\texttt{aa}$ | $\texttt{ab}$ | No |
| $\texttt{aab}$ | $\texttt{aab}$ | Yes |
| $\texttt{aaba}$ | $\texttt{aaab}$ | No |
| $\texttt{aabaa}$ | $\texttt{baaab}$ | No |
| $\texttt{aabaaa}$ | $\texttt{abaaab}$ | No |
So the only border of $\texttt{aabaaab}$ is $\texttt{aab}$, with length 3.
And for $s = \texttt{abcde}$, there are no borders at all.
The failure function $\pi$ is the core of the KMP algorithm. For a string $s$ of length $n$:
$$\pi[i] = \text{length of the longest proper border of } s[0 \ldots i]$$In other words, $\pi[i]$ is the largest $k < i + 1$ such that $s[0 \ldots k-1] = s[i-k+1 \ldots i]$.
We set $\pi[0] = 0$ since a single character has no proper border.
Example: $s = \texttt{abacabab}$
| $i$ | $s[0 \ldots i]$ | Longest border | $\pi[i]$ |
|---|---|---|---|
| 0 | $\texttt{a}$ | (none) | 0 |
| 1 | $\texttt{ab}$ | (none) | 0 |
| 2 | $\texttt{aba}$ | $\texttt{a}$ | 1 |
| 3 | $\texttt{abac}$ | (none) | 0 |
| 4 | $\texttt{abaca}$ | $\texttt{a}$ | 1 |
| 5 | $\texttt{abacab}$ | $\texttt{ab}$ | 2 |
| 6 | $\texttt{abacaba}$ | $\texttt{aba}$ | 3 |
| 7 | $\texttt{abacabab}$ | $\texttt{ab}$ | 2 |
The key idea: if $s[i] = s[\pi[i-1]]$, then $\pi[i] = \pi[i-1] + 1$. Otherwise, we “fall back” along the border chain: try matching $s[i]$ against $s[\pi[\pi[i-1]-1]]$, and so on, until we find a match or exhaust all borders.
vector<int> compute_prefix_function(const string& s) {
int n = s.size();
vector<int> pi(n, 0);
for (int i = 1; i < n; i++) {
int j = pi[i - 1];
while (j > 0 && s[i] != s[j])
j = pi[j - 1];
if (s[i] == s[j])
j++;
pi[i] = j;
}
return pi;
}
Time complexity: $O(n)$. Even though there is a while loop inside the for loop, $j$ increases at most once per iteration and each decrease in the while loop is strictly reducing $j$, so the total number of decreases across all iterations is at most $n$.
All borders of $s[0 \ldots n-1]$ can be found by following the failure chain starting from $\pi[n-1]$:
$$\pi[n-1], \quad \pi[\pi[n-1] - 1], \quad \pi[\pi[\pi[n-1] - 1] - 1], \quad \ldots$$until we reach 0. Each value in this chain is the length of a border.
For $s = \texttt{abacaba}$ with $\pi = [0, 0, 1, 0, 1, 2, 3]$:
- Start: $\pi[6] = 3$ $\implies$ border $\texttt{aba}$
- Next: $\pi[2] = 1$ $\implies$ border $\texttt{a}$
- Next: $\pi[0] = 0$ $\implies$ stop
So the borders are $\texttt{aba}$ and $\texttt{a}$.
Key property: every border of a border of $s$ is also a border of $s$. The set of all border lengths of $s$ is exactly the chain $\pi[n-1] \to \pi[\pi[n-1]-1] \to \ldots$
A period of $s$ is a positive integer $p$ such that $s[i] = s[i + p]$ for all valid $i$. Equivalently, $s$ can be built by repeating a string of length $p$ (with a possibly incomplete final copy).
There is a beautiful duality:
$$p \text{ is a period of } s \iff n - p \text{ is a border length of } s$$Why? If $s$ has a border of length $b$, then $s[0 \ldots b-1] = s[n-b \ldots n-1]$. This means every character at position $i < b$ equals the character at position $i + (n-b)$, giving period $p = n - b$.
Example: $s = \texttt{abcabcab}$ (length 8) has border $\texttt{abcab}$ of length 5. So $p = 8 - 5 = 3$ is a period, and indeed the string is built from repeating $\texttt{abc}$ with a partial last copy.
This means:
- The shortest period corresponds to the longest border: $p_{\min} = n - \pi[n-1]$.
- $s$ is a complete repetition of a string of length $p$ (i.e., $p$ divides $n$) if and only if $\pi[n-1] > 0$ and $n \bmod (n - \pi[n-1]) = 0$.
If a string $s$ of length $n$ has periods $p$ and $q$, and $p + q \leq n + \gcd(p, q)$, then $s$ also has period $\gcd(p, q)$.
This theorem is powerful because it lets you combine information from two different periodic structures. A direct consequence: if $p + q \leq n$, then $\gcd(p, q)$ is also a period (the weaker, more commonly used form).
Before jumping into problems, here are the recurring ways borders show up in competitive programming.
Pattern 1: Finding special prefixes/suffixes. Given a string, find a prefix that is also a suffix (and possibly appears elsewhere). Follow the $\pi$-chain.
Pattern 2: String periodicity. “Is the string a repetition of a shorter string?” or “What is the shortest period?” Use $n - \pi[n-1]$.
Pattern 3: Pattern matching. Find all occurrences of pattern $p$ in text $t$. Build $\pi$ on the concatenated string $p + \texttt{\#} + t$ (where $\texttt{\#}$ is a separator not in the alphabet). Position $i$ in $t$ is a match endpoint whenever $\pi[\text{offset}] = |p|$.
Pattern 4: KMP automaton DP. Build the KMP automaton (transition function) from the failure function. Then do DP over states of this automaton. Used when you need to count/construct strings that contain (or avoid) a given pattern.
Pattern 5: Overlapping concatenation. When merging two strings with maximal overlap, the overlap length is the longest border of a carefully constructed string.
These problems directly test your understanding of borders and the prefix function.
CSES - Finding Borders Given a string, find all border lengths. Follow the $\pi$-chain from $\pi[n-1]$.
CSES - Finding Periods Given a string, find all periods. Use the border-period duality: $p$ is a period iff $n - p$ is a border length.
CSES - String Matching Count occurrences of a pattern in a text. The bread-and-butter KMP application: build $\pi$ on $p + \texttt{\#} + t$.
Codeforces 126B - Password Find the longest string that is a prefix, a suffix, and appears somewhere in the middle of $s$. Compute $\pi$, then check which borders also appear as a border of some interior prefix. A border of length $b$ appears in the interior if there exists some $i$ with $1 \leq i < n - 1$ such that $\pi[i] \geq b$.
Codeforces 432D - Prefixes and Suffixes For every border of $s$ (i.e., every prefix that equals a suffix), count how many times that prefix occurs in $s$. Use the prefix function and count occurrences by propagating counts up the $\pi$-chain.
Codeforces 1721E - Prefix Function Queries You are given a string $s$ and $q$ queries. Each query appends a short string $t$ to $s$ and asks for the prefix function values of the extended string at the new positions. Requires careful simulation of the failure function with the base string’s $\pi$ precomputed.
Codeforces 182D - Common Divisors Given a string, find how many strings $t$ exist such that $s = t + t + \ldots + t$. The answer is the number of periods $p$ that divide $n$, which corresponds to certain borders in the $\pi$-chain.
Codeforces 1295B - Infinite Prefixes Given a string $s$ that repeats infinitely, count prefixes with a given character balance. The periodic structure (derived from borders/periods of $s$) is key.
POI - Periods of Words (also on Kattis) For each prefix of $s$, find its longest period that is shorter than the prefix itself. This is equivalent to finding the shortest non-trivial border length of each prefix. Use the $\pi$-chain with amortized path compression.
Codeforces 1200E - Compress Words Given a sequence of words, merge them left to right, where the overlap between consecutive words is the longest string that is a suffix of the current result and a prefix of the next word. Compute this overlap using KMP on the junction.
In these problems, you don’t just compute $\pi$ — you build the full KMP automaton (the transition table $\delta(j, c)$ = “if we are in state $j$ and read character $c$, which state do we go to?”). Then you do DP over the automaton states.
Codeforces 1015F - Bracket Substring Count bracket sequences of length $2n$ that contain a given pattern as a substring. Build the KMP automaton on the pattern, then DP over (position, balance, KMP state, pattern seen).
Codeforces 346B - Lucky Common Subsequence Find the longest common subsequence of two strings that does not contain a forbidden pattern. LCS DP augmented with a KMP automaton state to track progress toward the forbidden pattern.
Codeforces 808G - Anthem of Berland Given a string with wildcards and a pattern, check if the wildcards can be filled so the pattern appears as a substring. DP on the KMP automaton states with transitions for wildcards trying all characters.
Codeforces 1163D - Mysterious Code Maximize a score by filling wildcards in a string, where the score depends on occurrences of two given patterns. Uses Aho-Corasick (multi-pattern KMP).
Codeforces 526D - Om Nom and Necklaces Determine, for each substring length, whether the string can be decomposed into a necklace-like repetition structure. The period/border structure is fundamental here.
Codeforces 471D - MUH and Cube Walls Match a “shape” pattern against a wall. Convert heights to difference arrays, then run KMP matching on the differences. A clean example of reducing a non-obvious problem to pattern matching.
Codeforces 1137C - Museums Tour While not a pure string problem, it uses the periodic structure of schedules (repeating strings of open/closed days) combined with graph traversal. Recognizing the periodicity is a border-like insight.
Codeforces 1968G2 - Division + LCP (Hard Version) Split a string into $k$ equal parts maximizing LCP. Involves suffix arrays and understanding periodic structure deeply. The connection to borders: the LCP of a string with its own shifted copy is directly related to the prefix function.
Codeforces 25E - Test Given three strings, find the shortest string that contains all three as substrings. For each pair, compute the maximum overlap (longest suffix of one that is a prefix of the other) using KMP, then try all orderings.
The Z-function $z[i]$ gives the length of the longest substring starting at position $i$ that matches a prefix of $s$. It solves many of the same problems as the prefix function but from a different angle.
| Prefix function $\pi$ | Z-function $z$ |
|---|---|
| $\pi[i]$ = longest border of $s[0 \ldots i]$ | $z[i]$ = longest match of $s[i \ldots]$ with $s[0 \ldots]$ |
| Borders = prefix = suffix | Z-values = prefix match at each position |
| Follow chain to get all borders | $z[i] + i = n$ iff $s[i \ldots n-1]$ is a border |
You can convert between them in $O(n)$:
- From $z$ to $\pi$: for each $i$ where $z[i] > 0$, the positions $i, i+1, \ldots, i+z[i]-1$ have prefix matches of lengths $z[i], z[i]-1, \ldots, 1$ respectively. Take the maximum that reaches each position.
- From $\pi$ to $z$: follow all $\pi$-chains.
Many competitive programmers prefer whichever they find more intuitive. Problems solvable with one are almost always solvable with the other.
The border of a string is a deceptively simple concept — a prefix that is also a suffix — but it underpins an enormous amount of string algorithm theory. The prefix function computes longest borders, gives you all periods, enables pattern matching via KMP, and powers automaton-based DP for constrained string counting.
When you see a problem mentioning any of these:
- “prefix equals suffix”
- “repeating structure” or “periodic string”
- “find a pattern in a text”
- “count strings containing/avoiding a pattern”
- “overlap between strings”
there is a good chance that borders and the KMP failure function are the right tool.