So the problem I am solving is about finding the majority element in a set of integers, size of set 0 < n < 10^5, and each element 0 < a < 10^9. So I just need to find out which element shows up strictly more than n/2 times.
I have two solutions that I believe are both correct, but they're not acting like I would expect with my understanding of their complexity. Could someone explain to me what I've done wrong/misunderstood?
int getMajElement(vector<int> a)
{
std::unordered_map<int, int> countMap;
std::unordered_map<int, int>::iterator it;
for (int i : a)
{
it = countMap.find(i);
if (it == countMap.end())
{
countMap.insert(std::pair<int, int>(i, 1));
}
else
{
it->second++;
}
}
int mostOfNum = 0;
int mostOfNumCount = 0;
for (std::pair<int, int> pair : countMap)
{
if (pair.second > mostOfNumCount)
{
mostOfNumCount = pair.second;
mostOfNum = pair.first;
}
}
if (mostOfNumCount > floor(a.size() / 2))
{
return mostOfNum;
}
else
{
return -1;
}
}
From my understanding, the first "for(int i : a)" should run in O(n) time, while finding/incrementing the values should run in O(1) time for a hashmap. The second "for (std::pair pair : countMap)" loop should also run in O(n) time, as I'm just iterating through at most n pairs. This would be a total of O(n) time.
This function takes 2.4 seconds to run for n = 10^5, and each a = rand() % 10^9. I made sure to just time the functions, not setting up the initial values.
Then next one takes 0.70 seconds under the same conditions, but I would have expected the first one to be faster.
The second function uses a recursive divide-and-conquer method to solve the problem, and should take O(n log(n)) time. It basically splits the list in n individual parts, and then checks if the majority element on the left half is the same as the majority element on the right half. If not, it will scan the list to see which value is the overall majority (value > floor((right - left) / 2)) for that section and pass it back, else -1.
Could someone explain to me what's causing the time difference, is it just an implementation mistake I've made?
int get_majority_element(vector<int> &a, int left, int right) {
if (left == right) return -1;
if (left + 1 == right) return a[left];
int mid = left + floor((right - left) / 2);
int leftMajority = get_majority_element(a, left, mid);
int rightMajority = get_majority_element(a, mid, right);
if(leftMajority == rightMajority)
{
return leftMajority;
}
else if (rightMajority == -1)
{
return leftMajority;
}
else if (leftMajority == -1)
{
return rightMajority;
}
else
{
int leftCount = 0, rightCount = 0;
for (int i = left; i < right; i++)
{
if (a[i] == leftMajority)
{
leftCount++;
}
else if (a[i] == rightMajority)
{
rightCount++;
}
}
if (leftCount > floor((right - left) / 2))
{
return leftMajority;
}
else if (rightCount > floor((right - left) / 2))
{
return rightMajority;
}
else
{
return -1;
}
}
return -1;
}
This is too long for a comment.
Complexity theory is about what happens to a single algorithm as the size of the data grows. It is a limit as n --> infinity.
It is much less useful in comparing two different algorithms on the same size of data. Why? Because overhead can dominate the calculations. For instance, bubble sort is O(n^2). But on (very) small data sets, it can out-perform reasonable implementations of "faster" algorithms.
The proper comparison would be the speed of each algorithm on 10^5 elements, then 10^6, then 10^7. That is, how does the speed grow for a given algorithm.