I'm new to progamming.I was searching for a radix sort implementation in c++ and i found this code here.
void countSort(string a[], int size, size_t k)
{
string *b = NULL; int *c = NULL;
b = new string[size];
c = new int[257];
for (int i = 0; i <257; i++){
c[i] = 0;
}
for (int j = 0; j <size; j++){
c[k < a[j].size() ? (int)(unsigned char)a[j][k] + 1 : 0]++;
//a[j] is a string
}
for (int f = 1; f <257; f++){
c[f] += c[f - 1];
}
for (int r = size - 1; r >= 0; r--){
b[c[k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0] - 1] = a[r];
c[k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0]--;
}
for (int l = 0; l < size; l++){
a[l] = b[l];
}
// avold memory leak
delete[] b;
delete[] c;
}
void radixSort(string b[], int r)
{
size_t max = getMax(b, r);
for (size_t digit = max; digit > 0; digit--){
countSort(b, r, digit - 1);
}
}
So my question is what that lines do:
c[k < a[j].size() ? (int)(unsigned char)a[j][k] + 1 : 0]++;
b[c[k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0] - 1] = a[r];
c[k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0]--;
And is that a MSD or LSD radix Sort ?
Thanks.
This is a neat example of unnecessary compact and, hence, hard to read code.
To analyze it it helps to separate it a bit:
First the argument of
c
s subscription is taken out:The index calculation contains a condition:
The array
a
is an array ofstd::string
s wherestd::string
contains itself an array ofchar
. So,a[j][k]
results in a singlechar
. Achar
might be signed or unsigned – that's left to the compiler. So,(unsigned char)a[j][k]
doesn't change the bits of thatchar
but interprets them as unsigned number. Then(int)(unsigned char)a[j][k]
promotes that to anint
.Please, note that this may be different to
(int)a[j][k]
if the current compiler has signedchar
s because in that case the possible sign of a value would be kept. (This is called sign extension.) So, the whole thing is just responsible to convert the current character into a (positive) index and adds 1 finally.Actually, I intended to leave the rest as practice for the reader but then I saw this:
Separating it like above, it results in:
Considering that
iC
might result in 0 (while I didn't check the whole code whether this is possible at all),iC - 1
might result in-1
. So,c[-1]
would be accessed.This might be correct if e.g.
c
where a pointer into a bigger array but not at its begin. So, a negative index would access valid storage. This seems not to be the case here:and I couldn't see any other assignment to
c
.This all doesn't look much trustworthy. At best, the condition is over-pessimistic and 0 is never assigned.
I'm quite sure that I could demonstrate that less compact code may improve readability if not help to easier uncover possible issues in it.
So, is the non-compact code slower? According to my experience, it is not on modern compilers with its amazing optimization capabilities.
I once read an article about optimization and Static single assignment form. As well I see all the funny
$$
variables in Visual Studios debugger watch window from time to time when I debug my C++ code (which definitely doesn't contain any variable named$$
). So, I believe the compiler will do internally something similar as well. – Doing this explicitly to improve readability shouldn't have the least impact on to performance.If I really would be in doubt I still could check the assembler output. (Compiler Explorer is a good place for example.)
Btw.
c = new int[257];
?Why not
int c[257];
?257
int
values are not that much that I would afraid to exceed the stack size immediately.Not to mention that, arrays and especially arrays allocated with
new
are really bad C++ style and asking for U.B.. As if std::vector was not yet invented…I somehow missed the lessons about Radix sorting when I was a student (while I must admit that I didn't yet miss this knowledge in daily business). So, out of curiosity, I had a look into Wikipedia and re-implemented the descriptions found there. This is intended to provide a (hopefully better) replacement for what OP found and exposed in the question.
Thereby, I implemented
Output:
Live Demo on coliru
Please, note that the strings are sorted interpreting the numerical values of the characters. If instead an English-dictionary sorting would be intended than the digit to bucket mapping had to be modified. Thereby, the order of character values might be changed as well as mapping corresponding uppercase and lowercase characters to the same bucket.
Frequently copying strings (or other containers) is space and time consuming and something, I would prevent at best in productive code. The move semantics is one option to lower the stress for the CPU while keeping the code quite clean and comparable to the algorithm behind. This is what I tried to consider (to my best knowledge) in the sample code.