Find out if a received pointer is a string, ushort or array

280 Views Asked by At

I am interposing the memcpy() function in C because the target application uses it to concatenate strings and I want to find out which strings are being created. The code is:

void * my_memcpy ( void * destination, const void * source, size_t num )
{
    void *ret = memcpy(destination, source, num);
    // printf ("[MEMCPY] = %s \n", ret);
    return ret;
}

The function gets called succesfully but the first parameter can be whatever and I only want to trace it if the result is a string or array. I would have to ask if it is array or string. I know this can't be done straightforward: is there anyway to find out what RET points to?

I am working under MACOSX and interpositioning with DYLD.

Thank you very much.

4

There are 4 best solutions below

17
On BEST ANSWER

As void* represents a raw block of memory, there is no way to determine what actual data lies there.

However, you can make a "string-like" memory dump on every operation, just give the resulting output some sort of the "upper output limit".

This could be implemented the following way:

const size_t kUpperLimit = 32;

void output_memory_dump(void* memory) {
   std::cout.write(reinterpret_cast<char*>(memory), kUpperLimit);
}

For non-string like data the output would be hardly interpretable, but otherwise you'd get what you were searching for.

You could attempt to apply some guess-based approach like iterating through reinterpret_cast<void*>(memory) and making is_alphanumeric && is_space checks to every symbol, but this approach doesn't seem very stable (who knows what could actually lie in that void*...).

Anyway, for some situations that might be fine.

0
On

ret is equal to the destination pointer. But it's not possible to determine whether it's an array or a string, unless you know more information about the array or string (for instance, that the string is of a certain length and is null-terminated).

2
On

You can first apply some heuristics to the copied memory and based on that you can decide whether you want to print it.

static int maybe_string(const void *data, size_t n) {
  const unsigned char *p;
  size_t i;

  p = data;
  for (i = 0; i < n; i++) {
    int c = p[i];
    if (c == '\n' || c == '\r' || c == '\t')
      continue;
    if (1 <= c && c < 32)
      return 0; /* unusual ASCII control character */
    if (c == '\0' && i > 5)
      return 1; /* null-terminated and more than a few characters long */
  }

  return 0; /* not null-terminated, so it isn't a string */
}

This heuristic is not perfect. For example, it fails for the following pattern:

const char *str = "hello, world";
size_t len = strlen(str);
char *buf = malloc(1024);
memcpy(buf, str, len);
buf[len] = '\0';

If you want to catch that too, you will have to change the above function.

0
On

No, you cannot figure this out from a pointer of void type. Plus, you don't know the size of source or destination, so the heuristic approach will not work. It will not work due to other reasons as well, for example, binary data stored in memory region pointed by void* can really have zero byte at the end, but that doesn't mean that it is string.