I need an algorithm which can find the median of a singly linked list in linear time complexity O(n) and constant space complexity O(1).
EDIT: The singly linked list is a C-style singly linked list. No stl allowed (no container, no functions, everything stl is forbidden, e.g no std::forward_list). Not allowed to move the numbers in any other container (like an array). It's acceptable to have a space complexity of O(logn) as this will be actually even under 100 for my lists. Also I am not allowed to use the STL functions like the nth_element
Basically I have linked list with like 3 * 10^6 elements and I need to get the median in 3 seconds, so I can't use a sorting algoritm to sort the list (that will be O(nlogn) and will take something like 10-14 seconds maybe).
I've done some search online and I've found that it's posibile to find the median of an std::vector in O(n) and O(1) space compleity with quickselect (the worst case is in O(n^2), but it is rare), example: https://www.geeksforgeeks.org/quickselect-a-simple-iterative-implementation/
But I can't find any algoritm that does this for a linked list. The issue is that I can use the array index to randomly acces the vectorIf I want to modify that algoritm the complexity will be much bigger, because. For example when I change the pivotindex to the left I actually need to traverse the list to get that new element and go further (this will get me at least O(kn) with a big k for my list, even aproching O(n^2)...).
EDIT 2:
I know I have too many variables but I've been testing different stuff and I am still working on my code... My current code:
#include <bits/stdc++.h>
using namespace std;
template <class T> class Node {
public:
T data;
Node<T> *next;
};
template <class T> class List {
public:
Node<T> *first;
};
template <class T> T getMedianValue(List<T> & l) {
Node<T> *crt,*pivot,*incpivot;
int left, right, lung, idx, lungrel,lungrel2, left2, right2, aux, offset;
pivot = l.first;
crt = pivot->next;
lung = 1;
//lung is the lenght of the linked list (yeah it's lenght in romanian...)
//lungrel and lungrel2 are the relative lenghts of the part of
//the list I am processing, e.g: 2 3 4 in a list with 1 2 3 4 5
right = left = 0;
while (crt != NULL) {
if(crt->data < pivot->data){
aux = pivot->data;
pivot->data = crt->data;
crt->data = pivot->next->data;
pivot->next->data = aux;
pivot = pivot->next;
left++;
}
else right++;
// cout<<crt->data<<endl;
crt = crt->next;
lung++;
}
if(right > left) offset = left;
// cout<<endl;
// cout<<pivot->data<<" "<<left<<" "<<right<<endl;
// printList(l);
// cout<<endl;
lungrel = lung;
incpivot = l.first;
// offset = 0;
while(left != right){
//cout<<"parcurgere"<<endl;
if(left > right){
//cout<<endl;
//printList(l);
//cout<<endl;
//cout<<"testleft "<<incpivot->data<<" "<<left<<" "<<right<<endl;
crt = incpivot->next;
pivot = incpivot;
idx = offset;left2 = right2 = lungrel = 0;
//cout<<idx<<endl;
while(idx < left && crt!=NULL){
if(pivot->data > crt->data){
// cout<<"1crt "<<crt->data<<endl;
aux = pivot->data;
pivot->data = crt->data;
crt->data = pivot->next->data;
pivot->next->data = aux;
pivot = pivot->next;
left2++;lungrel++;
}
else {
right2++;lungrel++;
//cout<<crt->data<<" "<<right2<<endl;
}
//cout<<crt->data<<endl;
crt = crt->next;
idx++;
}
left = left2 + offset;
right = lung - left - 1;
if(right > left) offset = left;
//if(pivot->data == 18) return 18;
//cout<<endl;
//cout<<"l "<<pivot->data<<" "<<left<<" "<<right<<" "<<right2<<endl;
// printList(l);
}
else if(left < right && pivot->next!=NULL){
idx = left;left2 = right2 = 0;
incpivot = pivot->next;offset++;left++;
//cout<<endl;
//printList(l);
//cout<<endl;
//cout<<"testright "<<incpivot->data<<" "<<left<<" "<<right<<endl;
pivot = pivot->next;
crt = pivot->next;
lungrel2 = lungrel;
lungrel = 0;
// cout<<"p right"<<pivot->data<<" "<<left<<" "<<right<<endl;
while((idx < lungrel2 + offset - 1) && crt!=NULL){
if(crt->data < pivot->data){
// cout<<"crt "<<crt->data<<endl;
aux = pivot->data;
pivot->data = crt->data;
crt->data = (pivot->next)->data;
(pivot->next)->data = aux;
pivot = pivot->next;
// cout<<"crt2 "<<crt->data<<endl;
left2++;lungrel++;
}
else right2++;lungrel++;
//cout<<crt->data<<endl;
crt = crt->next;
idx++;
}
left = left2 + left;
right = lung - left - 1;
if(right > left) offset = left;
// cout<<"r "<<pivot->data<<" "<<left<<" "<<right<<endl;
// printList(l);
}
else{
//cout<<cmx<<endl;
return pivot->data;
}
}
//cout<<cmx<<endl;
return pivot->data;
}
template <class T> void printList(List<T> const & l) {
Node<T> *tmp;
if(l.first != NULL){
tmp = l.first;
while(tmp != NULL){
cout<<tmp->data<<" ";
tmp = tmp->next;
}
}
}
template <class T> void push_front(List<T> & l, int x)
{
Node<T>* tmp = new Node<T>;
tmp->data = x;
tmp->next = l.first;
l.first = tmp;
}
int main(){
List<int> l;
int n = 0;
push_front(l, 19);
push_front(l, 12);
push_front(l, 11);
push_front(l, 101);
push_front(l, 91);
push_front(l, 21);
push_front(l, 9);
push_front(l, 6);
push_front(l, 25);
push_front(l, 4);
push_front(l, 18);
push_front(l, 2);
push_front(l, 8);
push_front(l, 10);
push_front(l, 200);
push_front(l, 225);
push_front(l, 170);
printList(l);
n=getMedianValue(l);
cout<<endl;
cout<<n;
return 0;
}
Do you have any sugestion on how to adapt quickselect to a singly listed link or other algoritm that would work for my problem ?
In your question, you mentioned that you are having trouble selecting a pivot that is not at the start of the list, because this would require traversing the list. If you do it correctly, you only have to traverse the entire list twice:
The first step is not necessary if you don't care much about selecting a good pivot and you are happy with simply selecting the first element of the list as the pivot (which causes worst case O(n^2) time complexity if the data is already sorted).
If you remember the end of the list the first time you traverse it by maintaining a pointer to the end, then you should never have to traverse it again to find the end. Also, if you are using the standard Lomuto partition scheme (which I am not using for the reasons stated below), then you must also maintain two pointers into the list which represent the
i
andj
index of the standard Lomuto partition scheme. By using these pointers, should never have to traverse the list for accessing a single element.Also, if you maintain a pointer to the middle and the end of every partition, then, when you later must sort one of these partitions, you will not have to traverse that partition again to find the middle and end.
I have now created my own implementation of the QuickSelect algorithm for linked lists, which I have posted below.
Since you stated that the linked list is singly-linked and cannot be upgraded to a doubly-linked list, I can't use the Hoare partition scheme, as iterating a singly-linked list backwards is very expensive. Therefore, I am using the generally less efficient Lomuto partition scheme instead.
When using the Lomuto partition scheme, either the first element or the last element is typically selected as a pivot. However, selecting either of those has the disadvantage that sorted data will cause the algorithm to have the worst-case time complexity of O(n^2). This can be prevented by selecting a pivot according to the "median-of-three" rule, which is to select a pivot from the median value of the first element, middle element and last element. Therefore, in my implementation, I am using this "median-of-three" rule.
Also, the Lomuto partition scheme typically creates two partitions, one for values smaller than the pivot and one for values larger than or equal to the pivot. However, this will cause the worst-case time complexity of O(n^2) if all values are identical. Therefore, in my implementation, I am creating three partitions, one for values smaller than the pivot, one for values larger than the pivot, and one for values equal to the pivot.
Although these measures don't completely eliminate the possibility of worst-case time complexity of O(n^2), they at least make it highly unlikely (unless the input is provided by a malicious attacker). In order to guarantee a time complexity of O(n), a more complex pivot selection algorithm would have to be used, such as median of medians.
One significant problem I encountered is that for an even number of elements, the median is defined as the arithmetic mean of the two "middle" or "median" elements. For this reason, I can't simply write a function similar to
std::nth_element
, because if, for example, the total number of elements is 14, then I will be looking for the 7th and 8th largest element. This means I would have to call such a function twice, which would be inefficient. Therefore, I have instead written a function which can search for the two "median" elements at once. Although this makes the code more complex, the performance penalty due to the additional code complexity should be minimal compared to the advantage of not having to call the same function twice.Please note that although my implementation compiles perfectly on a C++ compiler, I wouldn't call it textbook C++ code, because the question states that I am not allowed to use anything from the C++ standard template library. Therefore, my code is rather a hybrid of C code and C++ code.
In the following code, I only use the standard template library (in particular the function
std::nth_element
) for testing my algorithm and for verifying the results. I do not use any of these functions in my actual algorithm.I have successfully tested my code with one million randomly generated elements and it found the correct median virtually instantaneously.