Editorial for K-Distinct Subarrays

Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

Approach

It is easier to count the complement.

Let count_at_most(t) be the number of non-empty subarrays containing at most $t$ ~t~ distinct values. Then the required answer is:

$\text{all subarrays} - \text{count\_at\_most}(k - 1)$

To compute count_at_most(t), use a two-pointer window. Expand the right end one step at a time, track frequencies inside the current window, and while the window has too many distinct values, move the left end forward until it becomes valid again.

Once the window $[l, r]$ ~[l, r]~ is the longest valid window ending at $r$ ~r~, there are exactly $r - l + 1$ ~r - l + 1~ valid subarrays ending at $r$ ~r~. Add that to the total.

This runs in $O(n)$ ~O(n)~ time.

Solution (Python)

from collections import defaultdict


n, k = map(int, input().split())
values = list(map(int, input().split()))


def count_at_most(limit: int) -> int:
    if limit <= 0:
        return 0

    freq: dict[int, int] = defaultdict(int)
    distinct = 0
    left = 0
    total = 0

    for right, value in enumerate(values):
        if freq[value] == 0:
            distinct += 1
        freq[value] += 1

        while distinct > limit:
            left_value = values[left]
            freq[left_value] -= 1
            if freq[left_value] == 0:
                distinct -= 1
            left += 1

        total += right - left + 1

    return total


all_subarrays = n * (n + 1) // 2
print(all_subarrays - count_at_most(k - 1))

Comments

There are no comments at the moment.