I'm writing a data structure in C to store commands; Here is the source pared down to what I'm unsatisfied with:
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
#include <errno.h>
#include "dbg.h"
#include "commandtree.h"
struct BranchList
{
CommandTree *tree;
BranchList *next;
};
struct CommandTree
{
wchar_t id; // wchar support actually has no memory cost due to the
bool term; // padding that would otherwise exist, and may in fact be
BranchList *list; // marginally faster to access due to its alignable size.
};
static inline BranchList *BranchList_create(void)
{
return calloc(1, sizeof(BranchList));
}
inline CommandTree *CommandTree_create(void)
{
return calloc(1, sizeof(CommandTree));
}
int CommandTree_putnw(CommandTree *t, const wchar_t *s, size_t n)
{
for(BranchList **p = &t->list;;)
{
if(!*p)
{
*p = BranchList_create();
if(errno == ENOMEM) return 1;
(*p)->tree = CommandTree_create();
if(errno == ENOMEM) return 1;
(*p)->tree->id = *s;
}
else if(*s != (*p)->tree->id)
{
p = &(*p)->next;
continue;
}
if(n == 1)
{
(*p)->tree->term = 1;
return 0;
}
p = &(*p)->tree->list;
s++;
n--;
}
}
int CommandTree_putn(CommandTree *t, const char *s, size_t n)
{
wchar_t *passto = malloc(n * sizeof(wchar_t));
mbstowcs(passto, s, n);
int ret = CommandTree_putnw(t, passto, n);
free(passto);
return ret;
}
This works perfectly well, but I'm rather unsatisfied with how I'm handling the fact that my tree supports wchar_t. I decided to add this when I realized that the padding of CommandTree would make any datatype smaller than than 7 bytes cost just as much memory anyway, but so as not to duplicate code, I have CommandTree_putn reuse the logic in the wchar_t-supporting CommandTree_putnw.
However, due to the difference in size of char and wchar_t, I can't just pass the array; I have to convert using mbstowcs and pass a temporary wchar_t * to CommandTree_putnw. This is suboptimal, given that CommandTree_putn is going to see the most usage and this quintuples the memory usage (sizeof (char) to sizeof (char) + sizeof (wchar_t)) of the stored string, which could stack if lots of these are going to be instantiated with longish commands.
I was wondering it I could do something like create a third function that would contain the logic, and get passed a size_t, depending in the value of which it would cast the string passed to it as a void * to either const char * or const wchar_t * but given that C is statically typed, I'd have to pretty much duplicate the logic with s cast to its respective type, which would ruin the idea I'm going for of "single instance of logic".
So ultimately, the question is, can I provide the program logic only once and pass wrappers const char * and const wchar_t * respectively, without creating a temporary wchar_t * in the function to handle const char *?
I don't know your hard requirements, but
wchar_ttends to be difficult to work with precisely because of this problem; it's too hard to mesh with existing code that useschar.All of the codebases I've worked with eventually migrated to UTF-8, which removes the necessity to store strings in a different type. UTF-8 works with the standard
strcpy/strlentype of string manipulation functions and is fully Unicode savvy. The only challenge is that you will need to convert it to UTF-16 to invoke Windows Unicode APIs. (OS X can use UTF-8 directly.) You didn't mention platform so I don't know if this will be an issue for you. In our case we just wrote Win32 wrappers that took UTF-8 strings.Can you use C++? If so, and the actual type
wchar_tis important (rather than Unicode support), you can templatize the functions and then instantiate them withstd::wstringorstd::stringdepending on string width. You can also write them to be based oncharandwchar_tif you are brave, but you'll need to write special wrapper functions to handle basic operations likestrcpyversuswcscpyand so it ends up being more work overall by far.In plain C, I don't think there's a silver bullet at all. There are yucky answers, but none I could recommend with a straight face.