I'm reading a C++ function that handles timestamps, and there's an expression like this:
t % (60 * 60) % 60
I thought this expression was strictly equivalent to:
t % 60
However, when I entered this code into https://godbolt.org, I found that gcc did not think so.
Information:
- Source code and compilation results: https://godbolt.org/z/fnsxjanja
- gcc version: x86-64 gcc 13.2 (with
-O3) - clang version: x86-64 clang 17.0.1 (with
-O3) - msvc version: x64 msvc v19.38 (with
-O2because there is no-O3) - icx version: x86-64 icx 2024.0.0 (with
-O3)
The following C++ code:
#include <cstdint>
uint64_t mod3600mod60(uint64_t n) { return n % 3600 % 60; }
uint64_t mod60(uint64_t n) { return n % 60; }
will be compiled into:
mod3600mod60(unsigned long):
movabs rax, 655884233731895169
mov rdx, rdi
shr rdx, 4
mul rdx
movabs rax, -8608480567731124087
shr rdx, 3
imul rdx, rdx, 3600
sub rdi, rdx
mul rdi
mov rax, rdx
shr rax, 5
mov rdx, rax
sal rdx, 4
sub rdx, rax
mov rax, rdi
sal rdx, 2
sub rax, rdx
ret
mod60(unsigned long):
movabs rax, -8608480567731124087
mul rdi
mov rax, rdx
shr rax, 5
mov rdx, rax
sal rdx, 4
sub rdx, rax
mov rax, rdi
sal rdx, 2
sub rax, rdx
ret
I know that for dividing by a constant or modulo a constant, the compiler will optimize, but for two consecutive modulo, the compiler does not seem to optimize to the same as modulo once.
First, I tried to compile the above two functions with gcc, clang, msvc and icx respectively, and found that they would not optimize these two functions to the same result.
Next, I tried changing both moduli to powers of 2 and found that for the following two functions:
#include <cstdint>
uint64_t mod2048mod64(uint64_t n) { return n % 2048 % 64; }
uint64_t mod64(uint64_t n) { return n % 64; }
All four compilers will optimize the two to be exactly the same. Here is an example of the compilation result of gcc:
mod2048mod64(unsigned long):
mov rax, rdi
and eax, 63
ret
mod64(unsigned long):
mov rax, rdi
and eax, 63
ret
Next, I tried changing the first modulus to a non-power of 2, for the following function:
#include <cstdint>
uint64_t mod3200mod64(uint64_t n) { return n % 3200 % 64; }
All four compilers will still optimize exactly the same as a single % 64.
Next, I tried to change both modulus to a smaller non-power of 2 to confirm whether the reason for the lack of optimization was that the modulus was too large. For the following two functions:
#include <cstdint>
uint64_t mod6mod3(uint64_t n) { return n % 6 % 3; }
uint64_t mod3(uint64_t n) { return n % 3; }
As with the 3600 and 60, none of the four compilers can optimize these two functions to the same level. Here is the optimization result of gcc as an example:
mod6mod3(unsigned long):
movabs rcx, -6148914691236517205
mov rax, rdi
mul rcx
shr rdx, 2
lea rax, [rdx+rdx*2]
add rax, rax
sub rdi, rax
mov rax, rdi
mul rcx
mov rax, rdx
and rdx, -2
shr rax
add rdx, rax
mov rax, rdi
sub rax, rdx
ret
mod3(unsigned long):
movabs rax, -6148914691236517205
mul rdi
mov rax, rdx
and rdx, -2
shr rax
add rdx, rax
mov rax, rdi
sub rax, rdx
ret
Finally, I changed the parameters and return values of all the above functions to uint32_t and uint16_t respectively for compilation, and the results obtained were similar to uint64_t.
Summarized as follows:
- For two operations on unsigned integers modulo a constant
n % A % B, whereAis a multiple ofB:- For the case where
Bis a power of 2, gcc, clang, msvc and icx can all optimize it to be the same as a singlen % B; - For the case where
Bis not a power of 2, gcc, clang, msvc and icx cannot optimize it to be the same as a singlen % B.
- For the case where
Shouldn't n % A % B and n % B be strictly mathematically equivalent if A is a multiple of B?
This question is based on a fundamental misunderstanding. The title starts out with a false mathematical assumption - the two operations are equivalent. But compilers are simply not obliged to implement equivalent operations using identical code.
Hence, all the experiments prove very little. And with the next update of each compiler, these results could change. So the total lack of version numbers means that this small bit of information is not very useful.
I suppose part of the reason for the question is a misunderstanding how code optimization works. Every specific optimization you can think of has to be implemented in the compiler code, proven to be robust for all possible legal inputs, and not interfere with other optimizations. Is it worth doing this? Consider the concrete example again - that is not an argument for optimizing compilers, that is an argument for better programmers. If you mean
x%60, then writex%60.Note that the very large constants (e.g.
-6148914691236517205) are already the result of an optimization. "Modulo" is essentially the remainder after division, but division is a very expensive operation. Hence, existing optimizations replace this by a multiply, with carefully tuned overflow operation (i.e. modulo 2^64), or indeed a bitmask operation for powers of two. With the%3600 %60case, this ends up with two overflowing multiplies. Yes, those can also be merged, but this is way harder to prove correct. Optimizing two modulo operations in a row would need to happen before the operation is replaced with a multiply.