Is it safe to "play" with parameter constness in extern "C" declarations?

239 Views Asked by At

Suppose I'm using some C library which has a function:

int foo(char* str);

and I know for a fact that foo() does not modify the memory pointed to by str. It's just poorly written and doesn't bother to declare str being constant.

Now, in my C++ code, I currently have:

extern "C" int foo(char* str);

and I use it like so:

foo(const_cast<char*>("Hello world"));

My question: Is it safe - in principle, from a language-lawyering perspective, and in practice - for me to write:

extern "C" int foo(const char* str);

and skip the const_cast'ing?

If it is not safe, please explain why.

Note: I am specifically interested in the case of C++98 code (yes, woe is me), so if you're assuming a later version of the language standard, please say so.

2

There are 2 best solutions below

7
KamilCuk On

Is it safe for me to write: and skip the const_cast'ing?

No.

If it is not safe, please explain why.

-- From language side:

After reading the dcl.link I think exactly how the interoperability works between C and C++ is not exactly specified, with many "no diagnostic required" cases. The most important part is:

Two declarations for a function with C language linkage with the same function name (ignoring the namespace names that qualify it) that appear in different namespace scopes refer to the same function.

Because they refer to the same function, I believe a sane assumption would be that the declaration of a identifier with C language linkage on C++ side has to be compatible with the declaration of that symbol on C side. In C++ there is no concept of "compatible types", in C++ two declarations have to be identical (after transformations), making the restriction actually more strict.

From C++ side, we read c++draft basic#link-11:

After all adjustments of types (during which typedefs are replaced by their definitions), the types specified by all declarations referring to a given variable or function shall be identical, [...]

Because the declaration int foo(const char *str) with C language linkage in a C++ translation unit is not identical to the declaration int foo(char *str) declared in C translation unit (thus it has C language linkage), the behavior is undefined (with famous "no diagnostic required").

From C side (I think this is not even needed - the C++ side is enough to make the program have undefined behavior. anyway), the most important part would be C99 6.7.5.3p15:

For two function types to be compatible, both shall specify compatible return types. Moreover, the parameter type lists, if both are present, shall agree in the number of parameters and in use of the ellipsis terminator; corresponding parameters shall have compatible types [...]

Because from C99 6.7.5.1p2:

For two pointer types to be compatible, both shall be identically qualified and both shall be pointers to compatible types.

and C99 6.7.3p9:

For two qualified types to be compatible, both shall have the identically qualified version of a compatible type [...]

So because char is not compatible with const char, thus const char * is not compatible with char *, thus int foo(const char *) is not compatible with int foo(char*). Calling such a function (C99 6.5.2.2p9) would be undefined behavior (you may see also C99 J.2)

-- From practical side:

I do not believe will be able to find a compiler+architecture combination where one translation unit sees int foo(const char *) and the other translation unit defines a function int foo(char *) { /* some stuff */ } and it would "not work".

Theoretically, an insane implementation may use a different register to pass a const char* argument and a different one to pass a char* argument, which I hope would be well documented in that insane architecture ABI and compiler. If that's so, wrong registers will be used for parameters, it will "not work".

Still, using a simple wrapper costs nothing:

static inline int foo2(const char *var) {
    return foo(static_cast<char*>(var));
}
2
Persixty On

I think the base answer is:

Yes, you can cast off const even if the referenced object is itself const such as a string literal in the example. Undefined behaviour is only specified to arise in the event of an attempt to modify the const object not as a result of the cast. Those rules and their reason to exist is 'old'. I'm sure they predate C++98.

Contrast it with volatile where any attempt to access a volatile object through a non-volatile reference is undefined behaviour. I can only read 'access' as read and/or write here.

I won't repeat the other suggestions but here is the most paranoid solution. It's paranoid not because the C++ semantics aren't clear. They are clear. At least if you accept something being undefined behaviour is clear!

But you've described it as 'poorly written' and you want to put some sandbags round it!

The paranoid solution relies on the fact that if you are passing a constant object it will be constant for the whole execution (if the program doesn't risk UB).

So make a single copy of "hello world" lower in the call-stack or even initialised as a file scope object. You can declare it static in a function and it will (with minimal overhead) only be constructed once.

This recovers almost all of the benefits of string literal. The lower down the call stack including file-scope (global you put it the better. I don't know how long the lifetime of the pointed-to object passed to foo() needs to be. So it needs to be at least low enough in the chain to satisfy that condition. NB: C++98 has std::string but it won't quite do here because you're still forbidden for modifying the result of c_str(). Here the semantics are defined.

#include <cstring>
#include <iostream>

class pseudo_const{
public:
    pseudo_const(const char*const cstr): str(NULL){
        const size_t sz=strlen(cstr)+1;
        str=new char[sz];
        memcpy(str,cstr,sz);
    }
    

    //Returns a pointer to a life-time permanent copy of 
    //the string passed to the constructor.
    //Modifying the string through this value will be reflected in all
    // subsequent calls.  
    char* get_constlike() const {
        return str;
    }
    
    ~pseudo_const(){
        delete [] str;
    }
private:
    char* str;

};

const pseudo_const str("hello world");

int main() {
    std::cout << str.get_constlike() << std::endl;
    return 0;
}