Balanced Parenthesis on Segments

Given a balanced parenthesis, how many substrings can you invert so that the string remains balanced?

The recently conducted Codeforces Educational Round 166 had an interesting problem D : Invertible Bracket Sequences that dealt with balanced parenthesis. Since I do not have a lot of experience with balanced parenthesis, and educational rounds are meant for you to learn new things, I decided to do some exploration, to solve not only the problem that I was seeing, but other problems on balanced parenthesis that might be trivial to someone with enough experience on the topic.

How do you find out if a string is a balanced parenthesis? If you convert each ( to +1 and ) to -1, then there are 2 conditions that needs to be satisfied:

The sum of the entire array should be zero.
All prefix sum should be non-negative.

Believe it or not, just this piece of information is enough to solve the problem from Edu Round.

The problem asks you to invert the subarray and then check for the validity. If the final string is still a valid parenthesis, then all the prefixes should satisfy condition 2. If the subarray that you invert is $[L \dots R]$ , then if you offset all the prefix sum values in $[L \dots R]$ by the amount $p r e f [L - 1]$ , these numbers should still satisfy condition 2.

This leads us to the following train of thoughts : How difficult is it to find out if a substring is a valid parenthesis or not? Surely this seems like an easier and very standard problem at first, so if we are not able to come up with the solution instantly, maybe we lack understanding of the basics of balanced parenthesis. Maybe solving this easier version might open up new insights that can help solve the Edu problem?

So I spent some time thinking about it, but could not come up with a solution. So naturally I searched for “balanced parentheses substrings codeforces” and came across this excellent comment by tenshi_kanade. In fact, every little detail that you read in this blog is inspired by that comment. Crazy to think that it’s been 8 years. Time flies!

First, we will attempt to solve the following problem in $O (n^{2})$ and later we’ll optimize it to $O (n \cdot l o g (n))$ or $O (n \cdot l o g^{2} (n))$

Count the number of substrings that are balanced parenthesis.

Just like the original version, what are the conditions required for a substring to be a balanced parenthesis?

The subarray sum of $[L \dots R]$ should be zero.
After accounting for offset of $p r e f [L - 1]$ , all prefix sum of the subarray should be non-negative.

Let’s count all the good substrings that start at index $L$ . Define $s a f e [L] = R$ to mean that $R$ is the rightmost safe index for $L$ . Here, a safe index means that the subarray satisfies condition 2. Observe that for a fixed $L$ , all safe indices would be consecutive, hence the rightmost keyword makes sense.

In other words, if $R$ is the rightmost safe index for $L$ , then if you ignore the rest of the array, and start taking prefix sum from $L$ , all the prefix sum would be non-negative and index $R$ is the last index till which this condition is satisfied.

How do we populate this $s a f e$ array? Fix any index $L$ and start iterating from $L$ to the right till you hit a negative sum.

vector<int> safe(n, -1);
for (int l = 0; l < n; l++) {
    for (int r = l; r < n; r++) {
        int sum = pref[r] - (l ? pref[l - 1] : 0);
        if (sum < 0) {
            break;
        }
        safe[l] = r;
    }
}

Now, for a fixed $L$ , we know that only the indices $[L \dots R]$ where $s a f e [L] = R$ would satisfy condition 2. But for the substring to be balanced, we also need condition 1 to be true. So among the safe indices, we are looking for indices with subarray sum zero. This can be achieved by storing the sorted indices corresponding to each prefix sum, and then looking for the occurrence of $p r e f [L - 1]$ amongst the safe indices (so that after the offset, the subarray sum becomes zero). Since the indices are sorted, a simple binary search would work.

To compute the answer, for each $L$ , we sum up all the safe indices with subarray sum zero.

map<int, vector<int>> psum_to_index;
for (int i = 0; i < n; i++) {
    psum_to_index[pref[i]].push_back(i);
}

long long ans = 0;
for (int l = 0; l < n; l++) {
    if (safe[l] == -1) {
        continue;
    }

    // To satisfy condition 1, we look for all safe indices which will
    // produce a subarray sum of zero.
    int need_psum = l ? pref[l - 1] : 0;
    auto &vec = psum_to_index[need_psum];

    ans += upper_bound(vec.begin(), vec.end(), safe[l]) -
            lower_bound(vec.begin(), vec.end(), l);
}

Here is the completed code. You can submit your $O (n^{2})$ code here.

The time complexity of this solution is $O (n^{2})$ but we can use black box data structure to improve it later. For now, just focus on the core $O (n^{2})$ algorithm. Can you now solve the Edu problem in in $O (n^{2})$ ? Hint : It only requires 2-3 lines of code change.

Spoiler : Solution for Edu Round D ↕

The original string is balanced. This means that there are exactly $n$ opening brackets and $n$ closing brackets. If you flip an opening bracket, the count of opening brackets would become $n - 1$ , hence, for the string to be balanced again, it is compulsory for you to flip a closing bracket as well, so that the count of each type of bracket remains $n$ .

Hence, we have a necessay condition : The selected substring’s sum must be zero, i.e, it should have an equal number of opening and closing brackets.

But is this condition sufficient? No!

Let’s divide the string into 3 parts

[pref [L...R] suff]

After inversion,

$[p r e f]$ should satisfy condition 2 mentioned earlier
$[p r e f, i n v ([L \dots R])]$ should satisfy condition 2
$[p r e f, i n v ([L \dots R]), s u f f]$ should satisfy condition 2.

Since the original string was balanced and the sum of $[L \dots R]$ is zero, the first and the last conditions are trivially satisfied. Hence, we only need to ensure that $[p r e f, i n v ([L \dots R])]$ satisfies condition 2. Let’s define

offset = prefix_sum(0...L - 1)

Then, you can notice that this is almost same as the easy version. We can solve the easy version on the flipped characters, and when checking condition (2) in the safe indices, instead of adding an offset of $p r e f [L - 1]$ of the inverted characters, we can add an offset of the non inverted characters. Previously we were checking a subarray in isolation, therefore no prefix sum could drop below zero. However, this time, each prefix would receive a support of “offset” from the left segment, so we need to ensure that the combo “offset + sum” doesn’t drop below zero.

// All prefix sum computation is done on the inverse of the string.
vector<int> a(n);
for (int i = 0; i < n; i++) {
    a[i] = (str[i] == '(') ? -1 : 1;
}

vector<int> pref(n);
map<int, vector<int>> psum_to_index;
for (int i = 0; i < n; i++) {
    pref[i] = a[i] + (i ? pref[i - 1] : 0);
    psum_to_index[pref[i]].push_back(i);
}

// There are 2 conditions for a substring to be balanced parentheses.
// 1. Subarray sum is zero.
// 2. Non-inverse prefix sum till L - 1 + Inverse prefsum[L...R] >= 0
//
// safe[l] = r means that any substring that starts at index 'l' and ends
// at index <= r will always satisfy condition 2. If no such index exists,
// it is set to -1.

vector<int> safe(n, -1);
int offset = 0;
for (int l = 0; l < n; l++) {
    for (int r = l; r < n; r++) {
        int sum = pref[r] - (l ? pref[l - 1] : 0);
        if (offset + sum < 0) {
            break;
        }
        safe[l] = r;
    }
    offset += (str[l] == '(') ? 1 : -1;
}

long long ans = 0;
for (int l = 0; l < n; l++) {
    if (safe[l] == -1) {
        continue;
    }

    // To satisfy condition 1, we look for all safe indices which will
    // produce a subarray sum of zero.
    int need_psum = l ? pref[l - 1] : 0;
    auto &vec = psum_to_index[need_psum];

    ans += upper_bound(vec.begin(), vec.end(), safe[l]) -
            lower_bound(vec.begin(), vec.end(), l);
}

And that solves the problem in $O (n^{2})$ .

Here is the full code. You can submit your $O (n^{2})$ code here.

If you have understood the blog so far, and have upsolved both these problems in $O (n^{2})$ , you can stop reading if you are completely new to data structures. The rest of the blog doesn’t talk about anything unique, just plain optimizations which are not hard to think of. We are also using a min segment tree as a black box so that you can understand even if you’re new to it. Make sure to read about Atcoder Library that enables you to use these data structures as a black box.

We need a way to calculate $s a f e [L]$ fast for each $L$ . Since all safe indices are consecutive to $L$ , we can do a binary seach on the possible $R$ values. If the guess is $m i d$ , we need to make sure that all indices in $[L \dots m i d]$ are safe. How do we do that? We query the minimum prefix sum value on this segment and check if it is safe or not. If the segment contains all safe indices, we will keep looking in $[m i d + 1, R]$ , else we will restrict ourselves to $[L, m i d - 1]$ .

To query the minimum on a segment, we can use segment trees, but if you want to avoid the $l o g$ factor, you can also use sparse table.

segtree<int, op, e> st(pref);
vector<int> safe(n, -1);
for (int l = 0; l < n; l++) {
    int low = l, high = n - 1;
    while (low <= high) {
        int mid = (low + high) / 2;

        // Adding +1 because atcoder's segtree has [close, open) range.
        int mn = st.prod(l, mid + 1);
        int sum = mn - (l ? pref[l - 1] : 0);
        if (sum < 0) {
            high = mid - 1;
        } else {
            low = mid + 1;
            safe[l] = mid;
        }
    }
}

Optimized Code for balanced substrings. You can submit your solution here.
Optimized Code for edu problem. You can submit your solution here.

Binary Search inside Segment Tree

$O (n \cdot l o g^{2} (n))$ passes comfortably with iterative segment trees. If you want to further improve the time complexity to $O (n \cdot l o g (n))$ , you can use sparse table. However, you can also do a binary search inside the segment tree, because the safe and unsafe indices are clustered together like so

s s s .... s u u u ... u

where s represents s and u represents unsafe. Luckily, the Atcoder Library already provides an interface for this internal binary search (max_right) so that you don’t have to dive into the details.

int target;
bool f(int v) { return v >= target; }

segtree<int, op, e> st(pref);
vector<int> safe(n, -1);
int offset = 0;
for (int l = 0; l < n; l++) {
    // Safe indices have the form
    // s, s, s, ..., s, u, u, u, .... u
    // s --> safe and u --> unsafe.
    //
    // Let's find out the rightmost unsafe index via binary search on
    // segtree.
    // An index is safe if offset + subarray_sum >= 0
    // offset + pref[r] - pref[l - 1] >= 0
    // pref[r] >= pref[l - 1] - offset
    target = (l ? pref[l - 1] : 0) - offset;
    int unsafe = st.max_right<f>(l);
    unsafe--;
    if (l <= unsafe) {
        safe[l] = unsafe;
    }
    offset += (str[l] == '(') ? 1 : -1;
}

Optimized Code for internal Binary Search for edu problem. You can submit your solution here.

Help

If you need help or clarifications, feel free to reach out to me on the following platforms.