Balanced Parenthesis on Segments
Given a balanced parenthesis, how many substrings can you invert so that the string remains balanced?
The recently conducted Codeforces Educational Round 166 had an interesting problem D : Invertible Bracket Sequences that dealt with balanced parenthesis. Since I do not have a lot of experience with balanced parenthesis, and educational rounds are meant for you to learn new things, I decided to do some exploration, to solve not only the problem that I was seeing, but other problems on balanced parenthesis that might be trivial to someone with enough experience on the topic.
How do you find out if a string is a balanced parenthesis? If you convert each (
to +1
and )
to -1
, then there are 2 conditions that needs to be satisfied:
- The sum of the entire array should be zero.
- All prefix sum should be non-negative.
Believe it or not, just this piece of information is enough to solve the problem from Edu Round.
The problem asks you to invert the subarray and then check for the validity. If the final string is still a valid parenthesis, then all the prefixes should satisfy condition 2. If the subarray that you invert is $[L \dots R]$, then if you offset all the prefix sum values in $[L \dots R]$ by the amount $pref[L - 1]$, these numbers should still satisfy condition 2.
This leads us to the following train of thoughts : How difficult is it to find out if a substring is a valid parenthesis or not? Surely this seems like an easier and very standard problem at first, so if we are not able to come up with the solution instantly, maybe we lack understanding of the basics of balanced parenthesis. Maybe solving this easier version might open up new insights that can help solve the Edu problem?
So I spent some time thinking about it, but could not come up with a solution. So naturally I searched for “balanced parentheses substrings codeforces” and came across this excellent comment by tenshi_kanade. In fact, every little detail that you read in this blog is inspired by that comment. Crazy to think that it’s been 8 years. Time flies!
First, we will attempt to solve the following problem in $\mathcal{O}(n^2)$ and later we’ll optimize it to $\mathcal{O}(n \cdot log(n))$ or $\mathcal{O}(n \cdot log^2(n))$
Count the number of substrings that are balanced parenthesis.
Just like the original version, what are the conditions required for a substring to be a balanced parenthesis?
- The subarray sum of $[L \dots R]$ should be zero.
- After accounting for offset of $pref[L - 1]$, all prefix sum of the subarray should be non-negative.
Let’s count all the good substrings that start at index $L$. Define $safe[L] = R$ to mean that $R$ is the rightmost safe index for $L$. Here, a safe index means that the subarray satisfies condition 2. Observe that for a fixed $L$, all safe indices would be consecutive, hence the rightmost keyword makes sense.
In other words, if $R$ is the rightmost safe index for $L$, then if you ignore the rest of the array, and start taking prefix sum from $L$, all the prefix sum would be non-negative and index $R$ is the last index till which this condition is satisfied.
How do we populate this $safe$ array? Fix any index $L$ and start iterating from $L$ to the right till you hit a negative sum.
vector<int> safe(n, -1);
for (int l = 0; l < n; l++) {
for (int r = l; r < n; r++) {
int sum = pref[r] - (l ? pref[l - 1] : 0);
if (sum < 0) {
break;
}
safe[l] = r;
}
}
Now, for a fixed $L$, we know that only the indices $[L \dots R]$ where $safe[L] = R$ would satisfy condition 2. But for the substring to be balanced, we also need condition 1 to be true. So among the safe indices, we are looking for indices with subarray sum zero. This can be achieved by storing the sorted indices corresponding to each prefix sum, and then looking for the occurrence of $pref[L - 1]$ amongst the safe indices (so that after the offset, the subarray sum becomes zero). Since the indices are sorted, a simple binary search would work.
To compute the answer, for each $L$, we sum up all the safe indices with subarray sum zero.
map<int, vector<int>> psum_to_index;
for (int i = 0; i < n; i++) {
psum_to_index[pref[i]].push_back(i);
}
long long ans = 0;
for (int l = 0; l < n; l++) {
if (safe[l] == -1) {
continue;
}
// To satisfy condition 1, we look for all safe indices which will
// produce a subarray sum of zero.
int need_psum = l ? pref[l - 1] : 0;
auto &vec = psum_to_index[need_psum];
ans += upper_bound(vec.begin(), vec.end(), safe[l]) -
lower_bound(vec.begin(), vec.end(), l);
}
Here is the completed code. You can submit your $\mathcal{O}(n^2)$ code here.
The time complexity of this solution is $\mathcal{O}(n^2)$ but we can use black box data structure to improve it later. For now, just focus on the core $\mathcal{O}(n^2)$ algorithm. Can you now solve the Edu problem in in $\mathcal{O}(n^2)$? Hint : It only requires 2-3 lines of code change.
If you have understood the blog so far, and have upsolved both these problems in $\mathcal{O}(n^2)$, you can stop reading if you are completely new to data structures. The rest of the blog doesn’t talk about anything unique, just plain optimizations which are not hard to think of. We are also using a min segment tree as a black box so that you can understand even if you’re new to it. Make sure to read about Atcoder Library that enables you to use these data structures as a black box.
We need a way to calculate $safe[L]$ fast for each $L$. Since all safe indices are consecutive to $L$, we can do a binary seach on the possible $R$ values. If the guess is $mid$, we need to make sure that all indices in $[L \dots mid]$ are safe. How do we do that? We query the minimum prefix sum value on this segment and check if it is safe or not. If the segment contains all safe indices, we will keep looking in $[mid + 1, R]$, else we will restrict ourselves to $[L, mid - 1]$.
To query the minimum on a segment, we can use segment trees, but if you want to avoid the $log$ factor, you can also use sparse table.
segtree<int, op, e> st(pref);
vector<int> safe(n, -1);
for (int l = 0; l < n; l++) {
int low = l, high = n - 1;
while (low <= high) {
int mid = (low + high) / 2;
// Adding +1 because atcoder's segtree has [close, open) range.
int mn = st.prod(l, mid + 1);
int sum = mn - (l ? pref[l - 1] : 0);
if (sum < 0) {
high = mid - 1;
} else {
low = mid + 1;
safe[l] = mid;
}
}
}
- Optimized Code for balanced substrings. You can submit your solution here.
- Optimized Code for edu problem. You can submit your solution here.
$\mathcal{O}(n \cdot log^2(n))$ passes comfortably with iterative segment trees. If you want to further improve the time complexity to $\mathcal{O}(n \cdot log(n))$, you can use sparse table. However, you can also do a binary search inside the segment tree, because the safe and unsafe indices are clustered together like so
s s s .... s u u u ... u
where s
represents s
and u
represents unsafe. Luckily, the Atcoder Library already provides an interface for this internal binary search (max_right
) so that you don’t have to dive into the details.
int target;
bool f(int v) { return v >= target; }
segtree<int, op, e> st(pref);
vector<int> safe(n, -1);
int offset = 0;
for (int l = 0; l < n; l++) {
// Safe indices have the form
// s, s, s, ..., s, u, u, u, .... u
// s --> safe and u --> unsafe.
//
// Let's find out the rightmost unsafe index via binary search on
// segtree.
// An index is safe if offset + subarray_sum >= 0
// offset + pref[r] - pref[l - 1] >= 0
// pref[r] >= pref[l - 1] - offset
target = (l ? pref[l - 1] : 0) - offset;
int unsafe = st.max_right<f>(l);
unsafe--;
if (l <= unsafe) {
safe[l] = unsafe;
}
offset += (str[l] == '(') ? 1 : -1;
}
- Optimized Code for internal Binary Search for edu problem. You can submit your solution here.
If you need help or clarifications, feel free to reach out to me on the following platforms.