Statistics on Tree: Root as LCA
This tutorial explains two beginner-friendly subtask solutions for Codeforces 2222G, Statistics on Tree.
- Part 1 solves it in $O(n^3)$ by fixing each possible root and evaluating each path separately.
- Part 2 optimizes the same idea to $O(n^2)$ by using a DFS after fixing the root.
We are given a tree with $n$ vertices. For every pair $(u,v)$ with $u \le v$, we remove all edges on the path from $u$ to $v$. The value of the pair is the size of the largest connected component left after these edges are removed.
For every $i$, we need to count how many pairs have value exactly $i$.
The pair $(u,v)$ is unordered, but the path between them is unique.
Instead of thinking about all unordered pairs at once, we can do this:
- Fix one endpoint, call it
root. - Root the whole tree at
root. - Compute the answer for every pair
(root, v).
When the tree is rooted at root, the LCA of (root, v) is always root, because root is an ancestor of every vertex.
If we repeat this for every possible root, then every unordered pair is seen twice: once as (u, v) and once as (v, u). To avoid double-counting, we only count the pair when root <= v.
So the plan is:
for root = 0 to n - 1:
root the tree at root
for v = root to n - 1:
compute the value of pair (root, v)
This is not fast enough for the full constraints, but it is a very useful stepping stone because it makes the tree components easy to understand.
After fixing root, run a DFS and compute:
parent[x]: the parent of vertexxin the rooted tree;subtree_size[x]: the size of the subtree ofx.
Now consider a vertex v. The path from root to v is just:
root -> ... -> v
We can recover it by starting from v and repeatedly moving to parent[v] until we reach root, then reversing the list.
Suppose the path is:
$$ p_0, p_1, p_2, \ldots, p_k $$where $p_0$ is root and $p_k$ is v.
The removed edges are:
$$ (p_0,p_1), (p_1,p_2), \ldots, (p_{k-1},p_k). $$Every remaining component is attached to one of the path vertices.
For this rooted-path case, we can compute the component sizes directly.
If the path has at least one edge, then the first removed edge is:
$$ (p_0,p_1). $$The subtree of $p_1$ gets separated from the root side. Therefore the component containing the root has size:
$$ n-\mathrm{subtree}_{\mathrm{size}}[p_1]. $$Now take an internal path vertex $p_i$, where $0 < i < k$.
Two path edges incident to it are removed:
$$ (p_{i-1},p_i) $$and
$$ (p_i,p_{i+1}). $$In the rooted tree, $p_{i+1}$ is a child of $p_i$. The part below $p_{i+1}$ is cut away. The part above $p_i$ is also cut away.
So the component containing $p_i$ has exactly the vertices in the subtree of $p_i$, except the subtree of $p_{i+1}$:
$$ \mathrm{subtree}_{\mathrm{size}}[p_i]-\mathrm{subtree}_{\mathrm{size}}[p_{i+1}]. $$At the endpoint $p_k=v$, only the edge to its parent on the path is removed.
The whole subtree of $v$ remains connected, so this component has size:
$$ \mathrm{subtree}_{\mathrm{size}}[v]. $$If root == v, the path has no edges.
So no edge is removed, and the whole tree remains one component. The value is:
$$ n. $$This accounts for all pairs $(v,v)$.
For a fixed root and endpoint v:
- Build the path from
roottov. - If the path has one vertex, the value is $n$.
- Otherwise:
- consider the root-side component $n-\mathrm{subtree}_{\mathrm{size}}[p_1]$;
- consider every internal component $\mathrm{subtree}_{\mathrm{size}}[p_i]-\mathrm{subtree}_{\mathrm{size}}[p_{i+1}]$;
- consider the endpoint component $\mathrm{subtree}_{\mathrm{size}}[v]$.
- The value is the maximum of all these component sizes.
Then increment:
$$ \mathrm{answer}[\mathrm{value}] $$by $1$.
Every valid pair in the statement has the form $(u,v)$ with $u \le v$.
When our loop fixes root = u, the inner loop eventually chooses endpoint v, so the pair is counted.
When the loop later fixes root = v, the same unordered pair would appear as (v, u), but it is skipped because we only process endpoints v with root <= v.
Therefore every pair is counted exactly once.
For each fixed root:
- DFS takes $O(n)$.
- There are $O(n)$ endpoints.
- For each endpoint, walking along the path can take $O(n)$.
So one root costs $O(n^2)$, and all roots cost:
$$ O(n^3). $$The memory usage is $O(n)$, apart from the answer array.
The $O(n^3)$ solution does too much repeated work.
For one fixed root, it computes the value of (root, v) by building the whole path from root to v. If the tree is a chain, this path can have length $O(n)$, and we do that for $O(n)$ endpoints. So one fixed root costs $O(n^2)$.
But when root is fixed, all paths from root to other vertices form one rooted tree. Neighboring paths share almost all of their information.
For example, suppose we already understand the path from root to v, and now we move to a child to of v.
The new path is:
root -> ... -> v -> to
Compared to the old path, only one new edge is added: (v, to).
So we should not rebuild the path from scratch.
For pair (root, v), the answer is the maximum component size after removing all path edges.
When we walk downward from root, we can maintain:
best_before_v
This means:
Among all components already created by cutting edges on the path from
rootdown tov, excluding the final subtree ofv, what is the largest component size?
Then, for the pair (root, v), there is one more candidate: the component containing v, whose size is:
Therefore:
$$ \mathrm{value}(\mathrm{root},v)=\max(\mathrm{best}_{\mathrm{before}},\mathrm{subtree}_{\mathrm{size}}[v]). $$Suppose the DFS is currently at vertex v, and it moves to a child to.
The path for the new pair (root, to) cuts the edge (v, to).
This creates a new component at the v side of that edge. Its size depends on whether v is the fixed root.
If v == root, then cutting (root, to) separates the child subtree of to from the rest of the tree. The component containing root has size:
If v != root, then v already has its parent edge on the path cut, and now the child edge (v, to) is also cut. The component around v contains the subtree of v, except the subtree of to:
Call this new component size component_at_v.
Then the best value to pass to the child is:
$$ \mathrm{best}_{\mathrm{before\_to}}= \max(\mathrm{best}_{\mathrm{before\_v}}, \mathrm{component}_{\mathrm{at\_v}}). $$And the value of pair (root, to) is:
For one fixed root, we do two DFS traversals:
dfs_size(root, -1)computes parents and subtree sizes when the tree is rooted atroot.dfs_values(root, 0, root)walks downward and computes all pairs(root, v).
The second DFS does constant work per edge, because it no longer rebuilds every path.
So for one root, the time becomes $O(n)$ instead of $O(n^2)$.
As before, pair (root, root) has value $n$.
For every other endpoint v, we count pair (root, v) only if:
root < v
This guarantees that each unordered pair is counted once.
For each fixed root:
dfs_sizetakes $O(n)$.dfs_valuestakes $O(n)$.
So one fixed root costs $O(n)$, and all $n$ roots cost:
$$ O(n^2). $$This is still not the full intended solution for the original constraints, but it is a big improvement over Part 1 and teaches the central rerooting idea: after choosing a root, compute all endpoint answers in one traversal.
Both implementations use a Tree class and avoid lambda functions.