removing constexpr from a variable capturing a constexpr function return value removes compile-time evaluation

220 Views Asked by At

Consider the following constexpr function, static_strcmp, which uses C++17's constexpr char_traits::compare function:

#include <string>

constexpr bool static_strcmp(char const *a, char const *b) 
{
    return std::char_traits<char>::compare(a, b,
        std::char_traits<char>::length(a)) == 0;
}

int main() 
{
    constexpr const char *a = "abcdefghijklmnopqrstuvwxyz";
    constexpr const char *b = "abc";

    constexpr bool result = static_strcmp(a, b);

    return result;
}

godbolt shows this gets evaluated at compile-time, and optimised down to:

main:
    xor     eax, eax
    ret

Remove constexpr from bool result:

If we remove the constexpr from constexpr bool result, now the call is no longer optimised.

#include <string>

constexpr bool static_strcmp(char const *a, char const *b) 
{
    return std::char_traits<char>::compare(a, b,
        std::char_traits<char>::length(a)) == 0;
}

int main() 
{
    constexpr const char *a = "abcdefghijklmnopqrstuvwxyz";
    constexpr const char *b = "abc";

    bool result = static_strcmp(a, b);            // <-- note no constexpr

    return result;
}

godbolt shows we now call into memcmp:

.LC0:
    .string "abc"
.LC1:
    .string "abcdefghijklmnopqrstuvwxyz"
main:
    sub     rsp, 8
    mov     edx, 26
    mov     esi, OFFSET FLAT:.LC0
    mov     edi, OFFSET FLAT:.LC1
    call    memcmp
    test    eax, eax
    sete    al
    add     rsp, 8
    movzx   eax, al
    ret

Add a short circuiting length check:

if we first compare char_traits::length for the two arguments in static_strcmp before calling char_traits::compare, without constexpr on bool result, the call is optimised away again.

#include <string>

constexpr bool static_strcmp(char const *a, char const *b) 
{
    return 
        std::char_traits<char>::length(a) == std::char_traits<char>::length(b) 
        && std::char_traits<char>::compare(a, b, 
             std::char_traits<char>::length(a)) == 0;
}

int main() 
{
    constexpr const char *a = "abcdefghijklmnopqrstuvwxyz";
    constexpr const char *b = "abc";

    bool result = static_strcmp(a, b);            // <-- note still no constexpr!

    return result;
}

godbolt shows we're back to the call being optimised away:

main:
    xor     eax, eax
    ret
  • Why does removing constexpr from the initial call to static_strcmp cause the constant evaluation to fail?
  • Clearly even without constexpr, the call to char_traits::length is evaluated at compile time, so why not the same behaviour without constexpr in the first version of static_strcmp?
3

There are 3 best solutions below

0
On BEST ANSWER

Note, that nothing in the standard explicitly requires constexpr function to be called at compile time, see 9.1.5.7 in latest draft:

A call to a constexpr function produces the same result as a call to an equivalent non-constexpr function in all respects except that (7.1) a call to a constexpr function can appear in a constant expression and (7.2) copy elision is not performed in a constant expression ([class.copy.elision]).

(emphasizes mine)

Now, when the call appears in constant expression, there is no way compiler can avoid running the function at compile time, so it dutifully obliges. When it does not (as in your second snippet) it is just a case of missing optimization. There is no shortage of those around here.

0
On

We have three working cases:

1) the computed value is required to initialize a constexpr value or where a compile-time-known value is strictly required (not-type template parameter, size of a C-style array, a test in a static_assert(), ...)

2) the constexpr function uses value not compile-time-known (by example: values received from standard input.

3) the constexpr function receive values compile-time-known but the result goes in a place not compile-time required.

If we ignore the as-if rule, we have that:

  • in case (1) the compiler must compute the value compile-time because the computed value is required compile-time

  • in case (2) the compiler must compute the value run-time because it's impossible compute it compile-time

  • in case (3) we are in a grey area where the compiler can compute the value compile-time but the computed value isn't strictly required compile-time; in this case the compiler can choose if compute compile-time or run-time.

With the initial code

constexpr bool result = static_strcmp(a, b);

you are in case (1): the compiler must compute compile-time because the result variable is declared constexpr.

Removing the constexpr,

bool result = static_strcmp(a, b); // no more constexpr

your code translate in the grey area (case (3)), where compile-time computation is possible but not strictly required, because the input values are known compile time (a and b) but the result goes where the value isn't compile-time required (an ordinary variable). So the compiler can choose and, in your case, choose the run-time computation with a version of the function, compile-time computation with another version.

0
On

Your program has undefined behavior, because you always compare strlen(a) characters. The string b doesn't have that much characters.

If you modify your strings to be equal length (so your program becomes well-defined), your program will be optimised as you expect.

So this is not missed optimization. The compiler would optimize your program, but because it contains undefined behavior, it doesn't optimize it.


Note, that whether it is undefined behavior or not, is not super clear. Considering that the compiler uses memcmp, it thinks that both of the input strings must be at least strlen(a) long. So according to the behavior of the compiler, it is undefined behavior.

Here's what the current draft standard says about compare:

Returns: 0 if for each i in [0, n), X::eq(p[i],q[i]) is true; else, a negative value if, for some j in [0, n), X::lt(p[j],q[j]) is true and for each i in [0, j) X::eq(p[i],q[i]) is true; else a positive value.

Now, it is not specified whether compare is allowed to read p[j+1..n) or q[j+1..n) (where j is the index of the first difference).