Borders of Strings

What is a Border?

A border of a string $s$ of length $n$ is a string that is simultaneously a proper prefix and a proper suffix of $s$. “Proper” means the border cannot be the entire string itself.

For example, consider $s = \texttt{abacaba}$.

Border	Length
$\texttt{a}$	1
$\texttt{aba}$	3
$\texttt{abacaba}$	7 (not proper, so not a border)

The longest border of $\texttt{abacaba}$ is $\texttt{aba}$ with length 3.

Another example: $s = \texttt{aabaaab}$.

Prefix	Suffix	Match?
$\texttt{a}$	$\texttt{b}$	No
$\texttt{aa}$	$\texttt{ab}$	No
$\texttt{aab}$	$\texttt{aab}$	Yes
$\texttt{aaba}$	$\texttt{aaab}$	No
$\texttt{aabaa}$	$\texttt{baaab}$	No
$\texttt{aabaaa}$	$\texttt{abaaab}$	No

So the only border of $\texttt{aabaaab}$ is $\texttt{aab}$, with length 3.

And for $s = \texttt{abcde}$, there are no borders at all.

The Failure Function (KMP Prefix Function)

The failure function $\pi$ is the core of the KMP algorithm. For a string $s$ of length $n$:

$$\pi[i] = \text{length of the longest proper border of } s[0 \ldots i]$$

In other words, $\pi[i]$ is the largest $k < i + 1$ such that $s[0 \ldots k-1] = s[i-k+1 \ldots i]$.

We set $\pi[0] = 0$ since a single character has no proper border.

Example: $s = \texttt{abacabab}$

$i$	$s[0 \ldots i]$	Longest border	$\pi[i]$
0	$\texttt{a}$	(none)	0
1	$\texttt{ab}$	(none)	0
2	$\texttt{aba}$	$\texttt{a}$	1
3	$\texttt{abac}$	(none)	0
4	$\texttt{abaca}$	$\texttt{a}$	1
5	$\texttt{abacab}$	$\texttt{ab}$	2
6	$\texttt{abacaba}$	$\texttt{aba}$	3
7	$\texttt{abacabab}$	$\texttt{ab}$	2

Computing the Failure Function

The key idea: if $s[i] = s[\pi[i-1]]$, then $\pi[i] = \pi[i-1] + 1$. Otherwise, we “fall back” along the border chain: try matching $s[i]$ against $s[\pi[\pi[i-1]-1]]$, and so on, until we find a match or exhaust all borders.

vector<int> compute_prefix_function(const string& s) {
    int n = s.size();
    vector<int> pi(n, 0);
    for (int i = 1; i < n; i++) {
        int j = pi[i - 1];
        while (j > 0 && s[i] != s[j])
            j = pi[j - 1];
        if (s[i] == s[j])
            j++;
        pi[i] = j;
    }
    return pi;
}

Time complexity: $O(n)$. Even though there is a while loop inside the for loop, $j$ increases at most once per iteration and each decrease in the while loop is strictly reducing $j$, so the total number of decreases across all iterations is at most $n$.

Enumerating All Borders

All borders of $s[0 \ldots n-1]$ can be found by following the failure chain starting from $\pi[n-1]$:

$$\pi[n-1], \quad \pi[\pi[n-1] - 1], \quad \pi[\pi[\pi[n-1] - 1] - 1], \quad \ldots$$

until we reach 0. Each value in this chain is the length of a border.

For $s = \texttt{abacaba}$ with $\pi = [0, 0, 1, 0, 1, 2, 3]$:

Start: $\pi[6] = 3$ $\implies$ border $\texttt{aba}$
Next: $\pi[2] = 1$ $\implies$ border $\texttt{a}$
Next: $\pi[0] = 0$ $\implies$ stop

So the borders are $\texttt{aba}$ and $\texttt{a}$.

Key property: every border of a border of $s$ is also a border of $s$. The set of all border lengths of $s$ is exactly the chain $\pi[n-1] \to \pi[\pi[n-1]-1] \to \ldots$

The Border-Period Duality

A period of $s$ is a positive integer $p$ such that $s[i] = s[i + p]$ for all valid $i$. Equivalently, $s$ can be built by repeating a string of length $p$ (with a possibly incomplete final copy).

There is a beautiful duality:

$$p \text{ is a period of } s \iff n - p \text{ is a border length of } s$$

Why? If $s$ has a border of length $b$, then $s[0 \ldots b-1] = s[n-b \ldots n-1]$. This means every character at position $i < b$ equals the character at position $i + (n-b)$, giving period $p = n - b$.

Example: $s = \texttt{abcabcab}$ (length 8) has border $\texttt{abcab}$ of length 5. So $p = 8 - 5 = 3$ is a period, and indeed the string is built from repeating $\texttt{abc}$ with a partial last copy.

This means:

The shortest period corresponds to the longest border: $p_{\min} = n - \pi[n-1]$.
$s$ is a complete repetition of a string of length $p$ (i.e., $p$ divides $n$) if and only if $\pi[n-1] > 0$ and $n \bmod (n - \pi[n-1]) = 0$.

Fine and Wilf’s Theorem (Weak Periodicity Lemma)

If a string $s$ of length $n$ has periods $p$ and $q$, and $p + q \leq n + \gcd(p, q)$, then $s$ also has period $\gcd(p, q)$.

This theorem is powerful because it lets you combine information from two different periodic structures. A direct consequence: if $p + q \leq n$, then $\gcd(p, q)$ is also a period (the weaker, more commonly used form).

Common Patterns in Problems

Before jumping into problems, here are the recurring ways borders show up in competitive programming.

Pattern 1: Finding special prefixes/suffixes. Given a string, find a prefix that is also a suffix (and possibly appears elsewhere). Follow the $\pi$-chain.

Pattern 2: String periodicity. “Is the string a repetition of a shorter string?” or “What is the shortest period?” Use $n - \pi[n-1]$.

Pattern 3: Pattern matching. Find all occurrences of pattern $p$ in text $t$. Build $\pi$ on the concatenated string $p + \texttt{\#} + t$ (where $\texttt{\#}$ is a separator not in the alphabet). Position $i$ in $t$ is a match endpoint whenever $\pi[\text{offset}] = |p|$.

Pattern 4: KMP automaton DP. Build the KMP automaton (transition function) from the failure function. Then do DP over states of this automaton. Used when you need to count/construct strings that contain (or avoid) a given pattern.

Pattern 5: Overlapping concatenation. When merging two strings with maximal overlap, the overlap length is the longest border of a carefully constructed string.

Practice Problems

Foundational

These problems directly test your understanding of borders and the prefix function.

CSES - Finding Borders Given a string, find all border lengths. Follow the $\pi$-chain from $\pi[n-1]$.

CSES - Finding Periods Given a string, find all periods. Use the border-period duality: $p$ is a period iff $n - p$ is a border length.

CSES - String Matching Count occurrences of a pattern in a text. The bread-and-butter KMP application: build $\pi$ on $p + \texttt{\#} + t$.

Borders with Extra Constraints

Codeforces 126B - Password Find the longest string that is a prefix, a suffix, and appears somewhere in the middle of $s$. Compute $\pi$, then check which borders also appear as a border of some interior prefix. A border of length $b$ appears in the interior if there exists some $i$ with $1 \leq i < n - 1$ such that $\pi[i] \geq b$.

Codeforces 432D - Prefixes and Suffixes For every border of $s$ (i.e., every prefix that equals a suffix), count how many times that prefix occurs in $s$. Use the prefix function and count occurrences by propagating counts up the $\pi$-chain.

Codeforces 1721E - Prefix Function Queries You are given a string $s$ and $q$ queries. Each query appends a short string $t$ to $s$ and asks for the prefix function values of the extended string at the new positions. Requires careful simulation of the failure function with the base string’s $\pi$ precomputed.

String Periods and Repetitions

Codeforces 182D - Common Divisors Given a string, find how many strings $t$ exist such that $s = t + t + \ldots + t$. The answer is the number of periods $p$ that divide $n$, which corresponds to certain borders in the $\pi$-chain.

Codeforces 1295B - Infinite Prefixes Given a string $s$ that repeats infinitely, count prefixes with a given character balance. The periodic structure (derived from borders/periods of $s$) is key.

POI - Periods of Words (also on Kattis) For each prefix of $s$, find its longest period that is shorter than the prefix itself. This is equivalent to finding the shortest non-trivial border length of each prefix. Use the $\pi$-chain with amortized path compression.

Overlapping Concatenation

Codeforces 1200E - Compress Words Given a sequence of words, merge them left to right, where the overlap between consecutive words is the longest string that is a suffix of the current result and a prefix of the next word. Compute this overlap using KMP on the junction.

KMP Automaton and DP

In these problems, you don’t just compute $\pi$ — you build the full KMP automaton (the transition table $\delta(j, c)$ = “if we are in state $j$ and read character $c$, which state do we go to?”). Then you do DP over the automaton states.

Codeforces 1015F - Bracket Substring Count bracket sequences of length $2n$ that contain a given pattern as a substring. Build the KMP automaton on the pattern, then DP over (position, balance, KMP state, pattern seen).

Codeforces 346B - Lucky Common Subsequence Find the longest common subsequence of two strings that does not contain a forbidden pattern. LCS DP augmented with a KMP automaton state to track progress toward the forbidden pattern.

Codeforces 808G - Anthem of Berland Given a string with wildcards and a pattern, check if the wildcards can be filled so the pattern appears as a substring. DP on the KMP automaton states with transitions for wildcards trying all characters.

Codeforces 1163D - Mysterious Code Maximize a score by filling wildcards in a string, where the score depends on occurrences of two given patterns. Uses Aho-Corasick (multi-pattern KMP).

Harder / Deeper Problems

Codeforces 526D - Om Nom and Necklaces Determine, for each substring length, whether the string can be decomposed into a necklace-like repetition structure. The period/border structure is fundamental here.

Codeforces 471D - MUH and Cube Walls Match a “shape” pattern against a wall. Convert heights to difference arrays, then run KMP matching on the differences. A clean example of reducing a non-obvious problem to pattern matching.

Codeforces 1137C - Museums Tour While not a pure string problem, it uses the periodic structure of schedules (repeating strings of open/closed days) combined with graph traversal. Recognizing the periodicity is a border-like insight.

Codeforces 1968G2 - Division + LCP (Hard Version) Split a string into $k$ equal parts maximizing LCP. Involves suffix arrays and understanding periodic structure deeply. The connection to borders: the LCP of a string with its own shifted copy is directly related to the prefix function.

Codeforces 25E - Test Given three strings, find the shortest string that contains all three as substrings. For each pair, compute the maximum overlap (longest suffix of one that is a prefix of the other) using KMP, then try all orderings.

Bonus: Z-Function as an Alternative

The Z-function $z[i]$ gives the length of the longest substring starting at position $i$ that matches a prefix of $s$. It solves many of the same problems as the prefix function but from a different angle.

Prefix function $\pi$	Z-function $z$
$\pi[i]$ = longest border of $s[0 \ldots i]$	$z[i]$ = longest match of $s[i \ldots]$ with $s[0 \ldots]$
Borders = prefix = suffix	Z-values = prefix match at each position
Follow chain to get all borders	$z[i] + i = n$ iff $s[i \ldots n-1]$ is a border

You can convert between them in $O(n)$:

From $z$ to $\pi$: for each $i$ where $z[i] > 0$, the positions $i, i+1, \ldots, i+z[i]-1$ have prefix matches of lengths $z[i], z[i]-1, \ldots, 1$ respectively. Take the maximum that reaches each position.
From $\pi$ to $z$: follow all $\pi$-chains.

Many competitive programmers prefer whichever they find more intuitive. Problems solvable with one are almost always solvable with the other.

Summary

The border of a string is a deceptively simple concept — a prefix that is also a suffix — but it underpins an enormous amount of string algorithm theory. The prefix function computes longest borders, gives you all periods, enables pattern matching via KMP, and powers automaton-based DP for constrained string counting.

When you see a problem mentioning any of these:

“prefix equals suffix”
“repeating structure” or “periodic string”
“find a pattern in a text”
“count strings containing/avoiding a pattern”
“overlap between strings”

there is a good chance that borders and the KMP failure function are the right tool.