Editorial: Statistics on Tree

Problem Recap

For every pair $(u,v)$ with $u \le v$, remove all edges on the simple path from $u$ to $v$. This splits the tree into several connected components. The value of the pair is the size of the largest such component.

We need to count, for every $i$, how many pairs have value exactly $i$.

Subtask 1: Simulate One Pair

The most direct way to understand the problem is to fix one pair $(u,v)$.

After removing the path edges, every remaining component is attached to some vertex on the path. For example, if a path vertex has a side subtree that is not used by the path, that side subtree becomes part of one component. The endpoints also keep the parts of their subtrees that do not use a removed edge.

So a very slow solution is:

For every pair $(u,v)$.
Find all edges on the path from $u$ to $v$.
Remove those edges.
Run DFS/BFS to find the largest remaining component.

This is clearly too slow, but it gives the right mental model: for each pair, we only care about the largest piece left after cutting the path.

The code page includes a complete implementation of this idea as bruteforce_sol.cpp. It is intended for small tests, but it is useful because it directly follows the statement.

Complexity of this subtask: the educational implementation uses a simple blocked-edge matrix for every pair, so it is $O(n^4)$ time and $O(n^2)$ memory. With edge ids instead of a matrix, the same brute force idea can be written in $O(n^3)$ time and $O(n)$ extra memory.

Subtask 2: Root the Tree

Root the tree at some vertex. Suppose we look at a pair whose lowest common ancestor is $x$.

There are two basic cases.

First, one endpoint is $x$, and the other endpoint lies in a child subtree of $x$ with size $a$. The edge from $x$ into that child subtree is removed, so the component outside that child subtree has size:

$$ n-a $$

Second, the two endpoints lie in two different child subtrees of $x$, with sizes $a$ and $b$. The two first edges from $x$ into these subtrees are removed, so the component outside both chosen child subtrees has size:

$$ n-a-b $$

If this outside component is always the largest remaining component, then counting becomes easy.

The code page includes this step first as rooted_pair_sol.cpp. For each pair, it lists all vertices on the path. For each path vertex, it adds the sizes of all neighbor-side components that are not cut off by the path edges, using subtree sizes to evaluate each side in $O(1)$.

There is also a cleaner $O(n^2)$ version, quadratic_root_sol.cpp. Fix one endpoint root, root the tree there, and compute all values for pairs (root, v) in one DFS. When the DFS moves from v to its child to, the newly separated component at v has size:

$n-\mathrm{sz}(\texttt{to})$ if v is the fixed root;
$\mathrm{sz}(\texttt{v})-\mathrm{sz}(\texttt{to})$ otherwise.

Maintaining the maximum of these already-created components gives the answer for every endpoint reached by this DFS.

Complexity of this subtask: rooted_pair_sol.cpp is $O(n^3)$ time and $O(n)$ memory. The rerooted DFS version quadratic_root_sol.cpp does $O(n)$ work for each fixed root, so it is $O(n^2)$ time and $O(n)$ memory.

Subtask 3: Choose a Centroid Root

Root the tree at a centroid $r$.

By definition, every component obtained after removing the centroid has size at most $n/2$. Therefore every rooted subtree away from the centroid also has size at most $n/2$.

Now consider a pair whose LCA is some vertex $x\ne r$.

The whole rooted subtree of $x$ has size at most $n/2$. If the path uses one child subtree of size $a$, the outside component has size $n-a$, which is more than $n/2$. Every component inside the used child subtree has size at most $a$, so the outside component is the largest.

If the path uses two child subtrees of sizes $a$ and $b$, then $a+b$ is strictly smaller than the subtree of $x$, hence at most $n/2-1$. The outside component has size:

$$ n-a-b > n/2 $$

Again, every component inside the two chosen branches has size at most $a$ or $b$, so the outside component is largest.

Thus, after rooting at a centroid, the simple formulas are correct for every non-centroid LCA.

Complexity of this subtask: finding a centroid and re-rooting the tree is $O(n)$. This step does not yet finish the problem; it is the structural observation that makes the later counting possible.

Subtask 4: Count All Naive Contributions

First, handle pairs $(v,v)$. No edge is removed, so the value is $n$. There are $n$ such pairs, so:

$$ \mathrm{ans}_n \mathrel{+}= n $$

Now root the tree at the centroid and compute all subtree sizes.

For every vertex $x$, collect the sizes of its child subtrees:

$$ s_1,s_2,\ldots,s_k $$

For pairs where one endpoint is $x$ and the other endpoint is inside child subtree $s_i$, add:

$$ \mathrm{ans}_{n-s_i} \mathrel{+}= s_i $$

For pairs whose endpoints lie in two different child subtrees $s_i$ and $s_j$, add:

$$ \mathrm{ans}_{n-s_i-s_j} \mathrel{+}= s_i s_j $$

This counts each non-diagonal pair exactly once, according to its LCA in the centroid-rooted tree.

To do the second part efficiently, the model solution groups equal subtree sizes. If a vertex has many children with the same size $s$, then pairs between two such children contribute:

$$ \binom{c_s}{2}s^2 $$

to $\mathrm{ans}_{n-2s}$, where $c_s$ is the number of child subtrees of size $s$.

For two different sizes $s$ and $t$, the contribution is:

$$ c_s c_t st $$

to $\mathrm{ans}_{n-s-t}$.

For one vertex, many equal child sizes collapse to one frequency entry. Also, if a vertex has $k$ distinct child-subtree sizes, then its children contain at least $1+2+\cdots+k$ vertices in total. This bounds the nested loop over distinct sizes well enough for the constraints.

Complexity of this subtask: computing all naive contributions costs $O(n\sqrt n)$ with the grouped-size loop used here, and $O(n)$ memory. This is already fast enough, but it still overcounts some centroid-crossing pairs.

Subtask 5: Find the Only Bad Case

The counting above assumes the outside component is the largest. We proved this for every LCA except the centroid.

So only pairs whose LCA is the centroid can be wrong.

Let $L$ be the largest child subtree of the centroid, and let:

$$ A=\mathrm{sz}(L) $$

For two centroid child subtrees with sizes $a$ and $b$, if neither is the largest subtree, then the outside component includes $L$, so it has size at least $A$. Since $A$ is at least as large as every other child subtree size, the outside component is still largest.

Therefore the only problematic pairs are:

one endpoint in $L$;
the other endpoint in a different child subtree of the centroid.

For such a second subtree of size $b$, the naive counting added $A b$ pairs to value $n-A-b$. We remove these contributions first:

$$ \mathrm{ans}_{n-A-b} \mathrel{-}= A b $$

Then we recount these pairs exactly.

Complexity of this subtask: identifying the largest centroid branch and subtracting the wrong naive contributions costs $O(\deg(r))$, hence $O(n)$.

Subtask 6: Recount the Bad Pairs

Fix a pair $(v,w)$ where $v$ is inside the largest centroid branch $L$, and $w$ is inside some other centroid branch.

The removed path goes from $v$ up to the centroid and then down to $w$.

The largest component after removing this path is the maximum of two independent quantities:

the largest component left on the $L$ side of the path;
the largest component left on the other side of the path.

Values Inside the Largest Branch

For a vertex $v$ in $L$, define $f(v)$ as the largest component that can appear inside the largest branch after removing the path from $v$ to the centroid.

When the path reaches $v$, the subtree of $v$ remains connected as one component, so $\mathrm{sz}(v)$ is a candidate.

For every ancestor step on the path, suppose we go from a vertex $p$ to its child $c$. The part of $p$’s subtree not going toward $c$ has size:

$$ \mathrm{sz}(p)-\mathrm{sz}(c) $$

This is also a candidate.

Thus a DFS inside $L$ can compute:

$$ f(v)=\max\left(\mathrm{sz}(v),\max(\mathrm{sz}(p)-\mathrm{sz}(c))\right) $$

over ancestor edges on the path from $L$ to $v$.

The model solution stores all these values in array yy.

Values Outside the Largest Branch

For a vertex $w$ in another centroid child subtree $C$, define $g(w)$ similarly for the other side.

There is one extra candidate: after cutting the edge into $L$ and the edge into $C$, the centroid remains connected to all other centroid branches. This component has size:

$$ n-A-\mathrm{sz}(C) $$

Then, while moving down from $C$ to $w$, we also consider the usual side parts $\mathrm{sz}(p)-\mathrm{sz}(c)$ and the endpoint subtree $\mathrm{sz}(w)$.

The model solution stores all these values in array xx.

For every bad pair $(v,w)$, the true value is:

$$ \max(f(v),g(w)) $$

So the remaining task is: given two arrays yy and xx, count the value of $\max(y,x)$ for every pair $(y,x)$.

Complexity of this subtask: computing all yy and xx values costs $O(n)$. If we recount every bad pair by a double loop, as in centroid_slow_sol.cpp, this step costs $O(|\texttt{yy}|\cdot|\texttt{xx}|)$, which can be $O(n^2)$.

Subtask 7: Count Pairwise Maximums

Sort both arrays.

For a value $x$ from xx, it contributes to answer value $x$ for every $y \le x$ in yy.

For a value $y$ from yy, it contributes to answer value $y$ for every $x < y$ in xx.

The strict inequality in the second sentence avoids double-counting equal values. This can be implemented with a two-pointer merge over the sorted arrays:

when the next xx value is smaller than the next yy value, add the number of already processed yy values to ans[xx];
otherwise, add the number of already processed xx values to ans[yy].

This exactly counts all corrected centroid-crossing pairs by their true maximum component size.

The code page includes centroid_slow_sol.cpp, which implements the centroid counting idea but recounts the bad pairs with a direct double loop over largest_side and other_side. That version is already logically complete, but the correction step can still be quadratic. The model solution replaces only this last double loop with the sorted two-pointer merge described above.

Complexity of this subtask: sorting costs $O(n\log n)$ and the two-pointer merge costs $O(n)$. This replaces the possible $O(n^2)$ correction from the previous subtask.

Full Algorithm

For each test case:

If $n=1$, output 1.
Find a centroid $r$.
Root the tree at $r$ and compute parent and subtree size arrays.
Add $n$ to $\mathrm{ans}_n$ for all pairs $(v,v)$.
For every vertex, add the naive LCA contributions:
- pairs from the vertex to one child subtree;
- pairs between two different child subtrees.
Let $L$ be the largest child subtree of the centroid.
Subtract the naive contributions for pairs crossing from $L$ to every other centroid child subtree.
Compute all $f(v)$ values in $L$.
Compute all $g(w)$ values outside $L$, but still below centroid child subtrees other than $L$.
Sort these two arrays and count pairwise maximums into the answer.
Output $\mathrm{ans}_1,\mathrm{ans}_2,\ldots,\mathrm{ans}_n$.

Correctness

Lemma 1

After rooting the tree at a centroid, for every pair whose LCA is not the centroid, the largest remaining component is the outside component counted by the naive formula.

Proof.

Let the LCA be $x\ne r$. Since the tree is rooted at the centroid, the whole subtree of $x$ has size at most $n/2$.

If the path uses one child subtree of $x$ with size $a$, the outside component has size $n-a > n/2$. Every component inside the used child subtree has size at most $a \le n/2$, so the outside component is largest.

If the path uses two child subtrees with sizes $a$ and $b$, then $a+b$ is smaller than the subtree of $x$, hence $a+b < n/2$. The outside component has size $n-a-b > n/2$, while every component inside the two used child subtrees has size at most $a$ or $b$. Therefore the outside component is again largest.

Lemma 2

The naive counting step counts every pair whose LCA is not the centroid with the correct value.

Proof.

Every non-diagonal pair has a unique LCA in the rooted tree. If one endpoint is the LCA, it is counted in the one-child-subtree contribution. If both endpoints are below the LCA in different child subtrees, it is counted in the two-child-subtree contribution.

By Lemma 1, the value assigned by these formulas is correct whenever the LCA is not the centroid. Each pair is counted once because the LCA is unique.

Lemma 3

Among pairs whose LCA is the centroid, the naive counting can be wrong only when one endpoint lies in the largest centroid child subtree.

Proof.

Let $L$ be a largest child subtree of the centroid, with size $A$.

If a centroid-crossing pair uses two child subtrees neither of which is $L$, then the outside component includes all of $L$, so its size is at least $A$. Since every used child subtree has size at most $A$, no component inside either used subtree can be larger than the outside component.

Pairs involving the centroid itself have outside component size $n-a$, where $a \le n/2$, so the outside component is also largest.

Thus only pairs crossing between $L$ and another centroid child subtree may have a larger component on the $L$ side.

Lemma 4

For a bad pair $(v,w)$ with $v$ in $L$ and $w$ in another centroid branch, its value is $\max(f(v),g(w))$.

Proof.

Removing the path from $v$ to $w$ separates components that lie on the $L$ side of the path and components that lie outside the $L$ side.

By definition, $f(v)$ is the largest component among all pieces created on the $L$ side: the endpoint subtree of $v$ and all side parts along the path from $L$ to $v$.

Similarly, $g(w)$ is the largest component among all pieces on the other side: the centroid component excluding $L$ and $w$’s branch, the endpoint subtree of $w$, and all side parts along the path to $w$.

Every remaining component belongs to exactly one of these two groups, so the largest remaining component has size $\max(f(v),g(w))$.

Lemma 5

The sorted two-pointer merge counts every bad pair exactly once by its true value.

Proof.

For each pair of values $(y,x)$, the true value is $\max(y,x)$.

If $x \ge y$, the pair is counted when processing $x$, because all yy values at most $x$ have already been processed. If $y > x$, the pair is counted when processing $y$, because all xx values strictly smaller than $y$ have already been processed.

These two cases are disjoint and cover all pairs, so every bad pair is counted exactly once at value $\max(y,x)$.

Theorem

The algorithm outputs the correct number of pairs for every possible value.

Proof.

Pairs $(v,v)$ are handled directly and all have value $n$.

By Lemma 2, all non-diagonal pairs whose LCA is not the centroid are counted correctly by the naive contribution step. By Lemma 3, the only centroid-LCA pairs that may be incorrect are exactly the bad pairs crossing from the largest centroid branch to another branch. The algorithm subtracts their naive contributions, and by Lemmas 4 and 5 recounts them exactly by their true values.

Therefore every pair is counted exactly once with its correct value, so every answer entry is correct.

Complexity

Finding the centroid and computing subtree sizes takes $O(n)$.

The naive contribution counting groups equal child-subtree sizes. A conservative bound for the total nested loops over distinct sizes is $O(n\sqrt n)$.

The correction step computes one value for each vertex below a centroid child subtree, sorts two arrays whose total length is at most $n-1$, and then merges them. This costs $O(n\log n)$.

So each test case runs in:

$$ O(n\sqrt n+n\log n) $$

with $O(n)$ memory.