Assigning a char pointer to string literal generated at runtime - Is this dynamic allocation?

581 Views Asked by At

I am in the process of reviewing some code that someone else wrote. I came across an interesting case involving strings within this code and I need help understanding how this works.

There is a function designed to be exported to a DLL. At the top of the function, we have this declaration

char *msg; // pointer to char
int error = 0; //my error code

Then, later in the code, we invoke a special library function for the IDE we use:

if(error < 0)
msg = getErrorString(error);

This inbuilt library function (getErrorString) expects you to provide a pointer to char where it can store the generated error string at runtime.

Finally, the code author calls the below:

free(msg); // freeing dynamically allocated memory?? 

So, I guess there is memory allocated dynamically at runtime which is large enough to store the generated error string? How is this allowed without explicitly calling something like malloc? If I were writing equivalent code, my first instinct would be to declare some static array like msg[256] and then do something like:

char msg[256] = {""};
sprintf(msg, "%s", getErrorString(error));

So my main question is, how can you declare a pointer to char, then assign it to a string which is generated at runtime, as shown in the original code? It seems that memory is being dynamically allocated at runtime, perhaps by the runtime engine. Is this what's happening with this code? Is my static array method preferred in this case?

3

There are 3 best solutions below

1
On BEST ANSWER

How is this allowed without explicitly calling something like malloc?

Well, it is almost certainly the case that somewhere within the implementation of the getErrorString, there is a call to malloc or the equivalent.

It is likely that the implementation of getErrorString looks something like this:

char *getErrorString(int error)
{
    char *ret = malloc(25);
    if(ret == NULL) abort();
    switch(error) {
        case EMUCHMEM:
            strcpy(ret, "too much memory");
            break;

        case EIUNDERFLOW:
            strcpy(ret, "integer underflow");
            break;

        case EDIVBY1:
            strcpy(ret, "divide by 1");
            break;

        default:
            sprintf(ret, "error %d", error);
            break;
    }

    return ret;
}

If I were writing equivalent code, my first instinct would be to declare some static array like msg[256] and then do something like: char msg[256] = {""}; sprintf(msg, "%s", getErrorString(error));

That doesn't make much sense. It represents unnecessary extra memory (256 bytes in msg[]) and unnecessary extra copying (by sprintf).

So my main question is, how can you declare a pointer to char, then assign it to a string which is generated at runtime, as shown in the original code?

You can always declare a pointer to char, then assign to it a string which is generated at runtime.

This can be a confusing point, though. You may have heard that strings are represented as arrays of char in C — which is true — and you may also have heard that you can't assign arrays in C — which is also true. You may have heard that instead of directly assigning strings, you always have to call strcpy. But that's not necessarily true — you can also "assign" strings by simply assigning pointers, which is what's going on when you say msg = getErrorString(error). In other words, there are two completely different ways of assigning strings in C: by copying arrays, or assigning pointers. See this other answer for more on this point.

It seems that memory is being dynamically allocated at runtime

Yes, that's how it seems.

Is my static array method preferred in this case?

As other comments have suggested, dynamic memory allocation in this case may or may not be a good idea. As a general rule, though, dynamic memory allocation is a perfectly fine — more or less vital — technique. Static memory allocation, on the other hand, can have plenty of problems of its own.

0
On

Based on the code you've described, getErrorString is clearly calling malloc or calloc (or some equivalent) itself and returning that pointer. This is actually fairly common practice - see the POSIX strdup function as another example.

The catch is that you are responsible for deallocating that memory when you're done with it. If getErrorString allocates memory dynamically then that needs to be documented so anyone using it will know to free that memory when they're done with it.

If I were writing equivalent code, my first instinct would be to declare some static array like msg[256] and then do something like:

char msg[256] = {""}; sprintf(msg, "%s", getErrorString(error));

Bad idea, since you discard the pointer value returned by getErrorString, meaning you can never free the memory that it allocated.

getErrorString is allocating all the necessary memory to store the string for you under the hood; you don't need to set aside your own buffer to store the string itself. You only need to store the returned pointer value so you can free that memory later.


How to handle dynamically allocated memory has been a thorny problem for APIs since forever. In general the ideal design has the entity responsible for memory allocation also be responsible for deallocation; personally, I would have designed getErrorString to receive the error code and a pointer to the target buffer and its size:

char *getErrorString( int errorCode, char *buf, size_t bufSize )
{
  switch( errorCode )
  {
    case SOME_ERROR:
      strncpy( buf, "Error message for SOME_ERROR", bufSize );
      break;

    case SOME_OTHER_ERROR:
      strncpy( buf, "Error message for SOME_OTHER_ERROR", bufSize );
      break;

    ...
  }
  /**
   * Make sure buf is properly 0-terminated, since strncpy won't
   * zero-terminate if the target buffer is shorter than the
   * source string
   */
  buf[bufSize-1] = 0;
  return buf;
}         

That way I am responsible for both the allocation and deallocation of the buffer. I can use an auto array and not have to worry about memory management at all:

void foo( void )
{
  char msg[81];
  ...
  fprintf( stderr, "%s", getErrorString( error, msg, sizeof msg ) );
  ...
}

In this case the error message is being written to msg; getErrorString returns the address of msg so it can be called as part of fprintf; and since it's an auto variable, the memory for msg will automatically be deallocated at function exit.

Alternately, I can dynamically allocate that memory if I wish:

char *msg = calloc( 81, sizeof *msg );
...
fprintf( stderr, "%s\n", getErrorString( error, msg, 81 );
...
free( msg );

But either way the responsibility for allocating and deallocating memory is in the same entity (the code calling getErrorString); it's not being split between two different entities.


Another option is for a function to maintain a static internal buffer:

char *getErrorString( int error )
{
  static char buf[SOME_SIZE+1]; // where SOME_SIZE is the length of the longest
                                // error string

  switch( error )
  {
    case SOME_ERROR: 
      strcpy( buf, "Error string for SOME_ERROR" );
      break;

     case SOME_OTHER_ERROR:
       strcpy( buf, "Error string for SOME_OTHER_ERROR" );
       break;
   
      ...
  }
  return buf;
}

Since buf is declared static its lifetime is the lifetime of the entire program, so it doesn't go away when getErrorString exits. This way nobody has to worry about managing memory for the buffer.

The problem with this approach is that getErrorString is no longer re-entrant or thread-safe - you only have that single buffer over the entire program, so if getErrorString is interrupted by other code that itself calls getErrorString, or if two threads call getErrorString simultaneously, then that buffer is going to be corrupted.


As a final alternative, if all the strings are constant, then there's no need to set aside any memory at all - just return the string literal directly:

/**
 * Attempting to modify a string literal invokes undefined behavior,
 * so we don't want this pointer to be used as the target of
 * a strcpy or sprintf call.  We change the return value to const char *
 * so the compiler will yell at us if we try to modify the pointed-to
 * string.  
 */
const char *getErrorString( int error )
{
  switch( error )
  {
    case SOME_ERROR:
      return "Error string for SOME_ERROR";
      break;

    case SOME_OTHER_ERROR:
      return "Error string for SOME_OTHER_ERROR";
      break;
    
    ...
  }
  return "Unknown error code";
}

Now we can just call the function directly and not worry about it:

fprintf( stderr, "%s\n", getErrorString( error ) );

If you still want to set aside memory to store the error string for whatever reason you can:

const char *str = getErrorString( error );
char *buf = malloc( strlen( str ) + 1 );
if ( buf )
  strcpy( buf, str );

or

char *buf = strdup( getErrorString( error ) ); 
2
On

getErrorString must documents that it calls malloc. It's generally considered very bad practice to write an API where functions return pointers to allocated memory and then expect someone else to clean it up. We know from 40 years of C history that badly designed APIs like this is exactly how you create memory leaks all over the place.

A better API would provide an explicit init/create function and an explicit cleanup/delete function. What those functions do internally is none of the caller's business - they just need to ensure to call init before anything else and cleanup the last thing they do.

On the topic of s****y API design, I would expect a function getErrorString to return a const char*, because why would an error message need to be modified by the caller? This ought to be an immutable string.