ConvertUTF16toUCS4 in Apache Xerces

47 Views Asked by At

The source code for Apache Xerces: ConvertUTF16toUCS4 is:

ConversionResult ConvertUTF16toUCS4(
    UTF16 **sourceStart, UTF16 *sourceEnd,
    UCS4 **targetStart, const UCS4 *targetEnd)
{
    ConversionResult result = ok;
    register UTF16 *source = *sourceStart;
    register UCS4 *target = *targetStart;
    while (source < sourceEnd)
    {
        register UCS4 ch;
        ch = *source++;
        if (ch >= kSurrogateHighStart && ch <= kSurrogateHighEnd && source < sourceEnd)
        {
            register UCS4 ch2 = *source;
            if (ch2 >= kSurrogateLowStart && ch2 <= kSurrogateLowEnd)
            {
                ch = ((ch - kSurrogateHighStart) << halfShift) + (ch2 - kSurrogateLowStart) + halfBase;
                ++source;
            };
        };
        if (target >= targetEnd)
        {
            result = targetExhausted;
            break;
        };
        *target++ = ch;
    };
    *sourceStart = source;
    *targetStart = target;
    return result;
};

I am trying to convert a UTF16 encoded surrogate pairs into UCS4 encoded data. I am using WindowsOS and a little endian machine.

If you look closely, you can see that, after the conversion, they are assigning target to *targetStart. So wouldn't it be pointing to the last element of the target instead of the first element of the target? When I remove the statement *targetStart = target; from my code, it is working as expected. Is this a bug in the API or am I missing something?

0

There are 0 best solutions below