Codeforces
CF Step
Youtube Linkedin Discord Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Hints: Colorful Points

Think about the naive approach first. Each operation can be implemented in $O(n)$ time: scan the string, mark every point whose left or right neighbor has a different color, then compact. How many operations can there be in the worst case, and why is that too slow?

A concrete bad input: $n/2$ copies of a followed by $n/2$ copies of b, i.e.

$$s = \underbrace{aa\cdots a}_{n/2}\underbrace{bb\cdots b}_{n/2}.$$

Only the two points at the a/b boundary get deleted in any given step, so we need $n/2$ operations, and the $t$-th scan still has $\Theta(n - 2t)$ characters. The total work is

$$\sum_{t=0}^{n/2-1} \Theta(n - 2t) = \Theta(n^2) \approx 10^{12}$$

for $n = 10^6$. (An alternating string like abab… is not a bad case — every point has a different-colored neighbor, so one operation wipes it out entirely.) We need a method that avoids rescanning long homogeneous stretches over and over.

Answer to Hint 1: We should exploit repetition. The input is a string of colors, but typically many consecutive points share the same color. Stare at a long same-colored stretch, say ...XXXXX..., and ask: among these five X’s, which ones can ever be deleted in the current step?

Only the leftmost X and the rightmost X have a chance of having a different-colored neighbor; every X strictly in the interior has X’s on both sides and is therefore safe this step.

Answer to Hint 2: Collapse the input into its run-length encoding. Represent the points as a list of runs

$$(c_1, \ell_1), (c_2, \ell_2), \ldots, (c_B, \ell_B)$$

where each $c_i$ is a color, $\ell_i$ is the length of the run, and $c_i \ne c_{i+1}$ by construction. One operation then affects only the endpoints of each run.

Question: exactly how many points does a single operation delete from run $i$?

Answer to Hint 3: Because adjacent runs have different colors:

  • An interior run ($1 \le i \le B$ with both neighbors present, i.e. $2 \le i \le B-1$) has its leftmost point bordering $c_{i-1} \ne c_i$ and its rightmost point bordering $c_{i+1} \ne c_i$. Both endpoints are deleted, so the run loses exactly $2$ points.
  • The first run ($i = 1$) has no left neighbor, so its leftmost point is safe; only its rightmost point is deleted. It loses $1$ point.
  • Symmetrically, the last run ($i = B$) loses $1$ point.

So in one step the update rule, at the run level, is just:

$$\ell_1 \mathrel{-}= 1, \quad \ell_B \mathrel{-}= 1, \quad \ell_i \mathrel{-}= 2 \text{ for } 2 \le i \le B-1.$$

This is independent of how long the runs are.

Answer to Hint 4 (cleanup): After subtracting, some runs may have $\ell_i \le 0$: those runs are entirely gone. Remove them from the list.

Now something subtle happens. Two runs that used to be separated by a deleted run are suddenly adjacent. By construction they had different colors from the run between them — but they might have the same color as each other. In that case we must merge them into a single run. (If you forget to merge, later steps will incorrectly treat their join as a boundary of different colors.)

The operation ends when there are $\le 1$ runs left, because then no point has a neighbor of a different color.

Answer to Hint 5 (termination and complexity): The process terminates when at most one run remains. Two things need a bound: the number of operations, and the total work across all operations.

Bounding the number of operations. At step $t$, let $B_t \ge 2$ be the number of runs. The step deletes exactly

$$2 \cdot (B_t - 2) + 2 \;=\; 2B_t - 2$$

characters (the two boundary runs contribute $1$ each, the $B_t - 2$ interior runs contribute $2$ each). In particular at least $2$ characters are deleted per step, and since the total number of characters starts at $n$ and is always non-negative, the total number of operations is at most $n / 2$.

Bounding the total work. The expensive part of step $t$ is the sweep over the $B_t$ current runs (subtraction, filtering zeros, merging). The work of step $t$ is $\Theta(B_t)$. We want

$$\sum_{t} B_t \;=\; O(n).$$

The trick is that the characters deleted at step $t$ already pay for $B_t$:

$$\text{chars deleted at step } t \;=\; 2B_t - 2 \;\ge\; B_t \quad (\text{since } B_t \ge 2).$$

Summing over all steps and using the fact that the total number of characters ever deleted is at most $n$:

$$\sum_{t} B_t \;\le\; \sum_{t} (2 B_t - 2) \;=\; \bigl(\text{total chars deleted}\bigr) \;\le\; n.$$

So the total work is $\sum_{t} \Theta(B_t) = O(n)$. Intuitively: every unit of work in a sweep can be charged to a character that is about to die, and each original character only dies once, so work is linear in $n$.

Combined with the $O(n)$ initial run-length encoding and the $O(n)$ amortized cleanup (each new next list has size $\le$ the old one, and the cost is already counted in $B_t$), the whole algorithm runs in $O(n)$ time and $O(n)$ memory.

Answer to Hint 6 (implementation): Keep the runs in a vector<pair<char,int>>.

build runs from s
ops = 0
while runs.size() > 1:
    ops++
    for i in 0 .. B-1:
        runs[i].second -= (i == 0 || i == B-1) ? 1 : 2
    next = []
    for r in runs:
        if r.second <= 0: continue
        if next non-empty and next.back().first == r.first:
            next.back().second += r.second
        else:
            next.push_back(r)
    runs = next
print ops

Two details that are easy to get wrong:

  1. Don’t test B > 1 and then forget that boundary runs lose only $1$ (not $2$); the first/last indices are special.
  2. Always merge same-colored adjacent runs in the cleanup pass, even if your input has only two distinct colors — merges really do happen (e.g. aabba becomes aa after one step).

Example: s = "aabcbbccbbaa".

Initial runs: $[(a,2),(b,1),(c,1),(b,2),(c,2),(b,2),(a,2)]$, so $B = 7$.

Step 1. Subtract $1,2,2,2,2,2,1$:

$$[(a,1),(b,-1),(c,-1),(b,0),(c,0),(b,0),(a,1)]$$

Drop non-positive, merge same-colored neighbors:

$$[(a,1),(a,1)] \ \longrightarrow\ [(a,2)].$$

Only one run left, so we stop. Answer: $1$ operation; the final string would be aa, matching the intuition that aa on the ends survives while every middle run peeled off both sides and vanished.

Why the boundary adjustment is correct: the problem only deletes a point when it has a neighbor of a different color. The very leftmost point of the string has no left neighbor at all, so it cannot be deleted via a left neighbor; if its right neighbor is the same color (same run), it cannot be deleted at all. The same argument applies to the rightmost point. Inside the string, every run-boundary point has a different-colored run on one side, so it must be deleted.

Complexity: $O(n)$ total time and $O(n)$ memory for the run list. For $n \le 10^6$ this is comfortably fast.

See model_sol_2.cpp for the reference implementation.