Find max frequency in each k sized subarray

88 Views Asked by At

Can be repetitive, but I couldn't find exact one.

This is the part of a problem involving Data Structure and Algotrithms: Given an array of size n, find max frequency among the all elements in the each window/subarray of size k.

Ex1. a = {5,5,7,7,7,5} , k = 3
subarrays of size 3: 
{5,5,7} => freq[5] = 2, freq[7] = 1,  max frequency = 2 
{5,7,7} => freq[5] = 1, freq[7] = 2,  max frequency = 2
{7,7,7} => freq[7] = 3,               max frequency = 3
{7,7,5} => freq[5] = 1, freq[7] = 2,  max frequency = 2
answer array = {2,2,3,2}

Ex2: a = {5,5,6,5,5,6,6}, k = 5 , ans = {4, 3, 3}
Ex3  a = {1,2,3,4,5,6}  k = 2, ans = {1,1,1,1,1}

My goal is to solve this problem optimally in time complexity < O(N^2). Something like O(N * logN * logN). Not looking for O(N^2) Solution.

Constraints: N <= 10^5 , k <=N and a[i] <= 10^5

My approach: I tried using an unordered map ( to store frequency of each subarray) and used multiset to resolve the problem of maximum after change in freq.

#include <bits/stdc++.h>
using namespace std;


int check(vector<int> &nums, int k)
    {
        int n=nums.size();
        unordered_map<int,int> mp;
        multiset<int> st;
        
        for(int i=0;i<n;i++)
        {

            if( i >= k )
            {
                int idx_b = i-k;
                 auto itr2 = st.find(mp[nums[idx_b]]);
                if ( itr2!=st.end() ) st.erase(itr2);
                mp[nums[idx_b]]--;
                st.insert(mp[nums[idx_b]]);
            }

            int curr = mp[nums[i]]; 
            mp[nums[i]]++;

            auto itr1 = st.find(curr);
            if(itr1!=st.end())st.erase(itr1);
            st.insert(curr+1);

            auto mx = st.rbegin();
            if(i>=k-1 )
            {
                cout<<*mx<<endl;
            }
        }
        return 0;
        
    }



int main()
{
    vector<int> a { 5,5,6,6,6,7};
    check(a,3);
    
}

In my understanding the time complexity is O( n * 6 * log(n) ) log(n) for insert, find, and erase. But I am suspecting it to be wrong as the soln is failing as TLE.

2

There are 2 best solutions below

0
Dave On

Use a max heap and a counting map.

Put elements in the heap based on their count.

Adjust counts & update the heap when elements enter/leave the sliding window.

Running time is O(n log k) for n elements and a window of size k.

Ex2: a = {5,5,6,5,5,6,6}, k = 5 , ans = {4, 3, 3}

index map           comment
0:    {5=>1}
1:    {5=>2}
2:    {5=>2, 6=>1}
3:    {5=>3, 6=>1}
4:    {5=>4, 6=>1} # We've processed 5 elts; top of heap is 4 (from the 5)
5:    {5=>3, 6=>2} # We lose a 5 and gain a 6; top of heap is 3 (from the 5)
6:    {5=>2, 6=>3} # We lose another 5 and gain another 6; top of heap is 3 (from the 6)
7
trincot On

You could use a combination of the following:

  • A map with a value (from the array) as key, and the corresponding frequency it has in the current window as value
  • A vector where the index represents a frequency, and the corresponding value represents the number of distinct values that have that frequency in the current window
  • An integer that holds the maximum frequency held by a value in the current window

Then it is only a matter of keeping things updated as you extend the window to the right, and shrink it from the left:

#include <iostream>
#include <vector>
#include <unordered_map>
#include <set>

std::vector<int> check(std::vector<int> &nums, int k) {
    int n = nums.size();
    std::unordered_map<int, int> frequencyOfValue;
    std::vector<int> countOfFrequency(n);
    int maxFrequency = 0;
    std::vector<int> result;

    // Get all distinct values and register their frequency (of 0)
    for (int value : nums) {
        frequencyOfValue[value] = 0;
    }
    countOfFrequency[0] = frequencyOfValue.size(); // Number of distinct values

    // Slide the window over the input vector
    for (int last = 0, start = 0; last < n; last++) {
        int value = nums[last];
        // Add this value to the data structure
        int frequency = frequencyOfValue[value];
        countOfFrequency[frequency]--;
        frequency++;
        countOfFrequency[frequency]++;
        frequencyOfValue[value] = frequency;
        if (frequency > maxFrequency) {
            maxFrequency++;
        }
        if (last >= k) { 
            // Remove the left side value from the data structure
            value = nums[start];
            frequency = frequencyOfValue[value];
            int count = --countOfFrequency[frequency];
            if (frequency == maxFrequency && count == 0) {
                // This was the only value having the max frequency
                maxFrequency--;
            }
            frequency--;
            countOfFrequency[frequency]++;
            frequencyOfValue[value] = frequency;
            start++;
        }
        if (last >= k - 1) {
            result.push_back(maxFrequency);
        }
    }
    return result;
}

NB: I wouldn't output the result with cout in the function itself. Leave that for the caller to do. Also don't #include <bits/stdc++.h> and don't do using namespace std.

This has a time complexity of O().