Why is this C++ deque code 'less efficient' when I demodularize it?

254 Views Asked by At

This is a problem I faced when studying solutions to this problem on HackerRank. It basically boils down to the following: given an array A and an integer K, the problem asks you to find the maximum of each contiguous subarray of A with size K. There is some additional stuff (for each test case, this problem occurs T times, and one must read from input to obtain A), but that's the gist of it.

The site checks whether your answer is sufficiently efficient, and even if your code produces the correct output in the end, the submission may fail due to timeout. Now, when your first get to the code area, there is some predefined code for you:

#include <iostream>
#include <deque> 
using namespace std;

void printKMax(int arr[], int n, int k){
    //Write your code here.
}

int main(){

    int t;
    cin >> t;
    while(t>0) {
        int n,k;
        cin >> n >> k;
        int i;
        int arr[n];
        for(i=0;i<n;i++)
            cin >> arr[i];
        printKMax(arr, n, k);
        t--;
    }
    return 0;
}

All fine and dandy, the site already handles input for your benefit, and points you towards a modular approach. All you have to do is actually solve the contiguous subarray problem; I did it with the following function, which passes all test cases:

void printKMax(int arr[], int n, int k){
    // Queue will store up to k elements (moving window)
    // Elements stored will be indices of the array. Indices allow to compare with loop iterator for removal of oldest element as window moves
    // Most recent elements are pushed to the back (so they take longer to be popped)
    std::deque<int> Q(k); 

    // Iterate over the first window
    int i; 
    for (i = 0; i < k; ++i) { 
        // A new element (arr[i]) to be added at the back invalidates all elements added before it that are no greater (if they are smaller, they cannot be a maximum; if they are equal, the new element is more recent). Those elements are popped
        // This guarantees that the values corresponding to Q's indices will be increasing (strictly) from back to front. In particular, the front will be the maximum
        while ((!Q.empty()) && arr[i] >= arr[Q.back()]) 
            Q.pop_back();
        Q.push_back(i); 
    }

    // Iterate, letting the window move
    for (; i < n; ++i) { 
        // Print the largest element (window has not moved yet), followed by a space
        cout << arr[Q.front()] << " "; 

        // Use indices to determine if oldest element has gone out of the window
        if (Q.front() <= i - k) Q.pop_front();

        // Add new element, like before
        while ((!Q.empty()) && arr[i] >= arr[Q.back()]) 
            Q.pop_back(); 
        Q.push_back(i); 
    } 

    // Print the maximum element of last window, and end the line 
    cout << arr[Q.front()] << endl; 
}

However, looking back on it, I thought I could do (slightly?) better. For each test case, the input handling in main loops n times to fill the array arr, and then printKMax in turn also loops n times to create and slide the moving window. I thought I could combine these into a single loop, which I guess should be more efficient. I did it with the following code, which is basically the same as before but with printKMax incorporated into the main. I'll omit comments this time.

int main(){

    int t;
    cin >> t;
    while(t>0) {
        int i = 0, n, k;
        cin >> n >> k;
        int arr[n];

        std::deque<int> Q(k);

        for (; i < k; ++i) {
            cin >> arr[i];
            while ((!Q.empty()) && arr[i] >= arr[Q.back()]) 
                Q.pop_back();
            Q.push_back(i); 
        }

        for (; i < n; ++i) {
            cout << arr[Q.front()] << " "; 

            if (Q.front() <= i - k) Q.pop_front();

            cin >> arr[i];
            while ((!Q.empty()) && arr[i] >= arr[Q.back()]) 
                Q.pop_back(); 
            Q.push_back(i); 
        } 

        cout << arr[Q.front()] << endl;

        t--;
        }
    return 0;
}

This time, however, the code fails all but the simplest of test cases, in each case due to time limit constraints.

I understand that in practice modularization is good, but at this point my interest is more 'academic' than anything else. Is there something I'm missing about the actual program, or is this something related to how things are run behind the scenes that I'm unaware of?


EDIT: As suggested by PaulMcKenzie, I've used g++ to run both versions of the program, using one of the failed test cases as input. Both ran fine and produced the same output, but the one with printKMax ran in 3.97s while the one without ran in 5.44s. It takes about 37% longe.

2

There are 2 best solutions below

1
On BEST ANSWER

You second version is slower, because you're interleaving your input and output.

Reading from cin causes any output buffered for cout to be flushed first. The idea is that there is a user that needs to see your prompt.

Since you switch between reading and writing many times for each test, it results in many I/O operations to write to the console, which is not so fast.

See: https://www.geeksforgeeks.org/fast-io-for-competitive-programming/ and https://en.cppreference.com/w/cpp/io/basic_ios/tie

you can fix it with cin.tie(NULL);

12
On

One reason the "de-modularized" version is slow is because you have more I/O in the loop. From the C++ standard:

Reading an object designated by a volatile glvalue (6.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment.

After calling I/O function the code must re-load objects from memory (unless the object didn't have to be stored in memory at all, like local variables whose address hasn't leaked). Note, that any function whose definition isn't available at compile time is considered to call I/O functions.

To improve the de-modularized version performance, do not do any I/O in it. Rather store the results into another array int results[k]; and print that array later.


Another issue is that by default std::cin and std::cout are in sync_with_stdio:

In practice, this means that the synchronized C++ streams are unbuffered, and each I/O operation on a C++ stream is immediately applied to the corresponding C stream's buffer.

Try calling std::cin.sync_with_stdio(false); and std::cout.sync_with_stdio(false); - that makes C++ I/O much faster.