Write Unicode text to a file in 'wb' mode using C and Objective C?

142 Views Asked by At

I have this unicode text which contains unicode characters

  NSString *fileName = @"Tên tình bạn dưới tình yêu.mp3";
  const char *cStringFile = [fileName UTF8String];

Now I need to save this string in hex/binary format to a file in this format

 T  ê  n     t  ì  n  h     b    ạ   n
 54 EA 6E 20 74 EC 6E 68 20 62 1EA1 6E ...... and so on

As you can see the character 'ê' is written as EA, but 'ạ' is written as '1E A1' which is correct as per the Vietnamese character set (https://vietunicode.sourceforge.net/charset/)

To achieve this, this is the code, I used to write multibyte characters to the file

// Determine the required size for the wchar_t string
size_t input_length = strlen(cStringFile);
size_t output_length = mbstowcs(NULL, stringText, input_length);

// Allocate memory for the wchar_t string
wchar_t *output = (wchar_t *)malloc((output_length + 1) * sizeof(wchar_t));
if (output == NULL) {
    printf("Memory allocation failed.\n");
    return 1;
}

// Convert the C string to wchar_t string
mbstowcs(output, cStringFile, input_length);
output[output_length] = L'\0'; // Add null-termination

unsigned long lenth = wcslen(output);
// Loop through each character in the Unicode text
for (int i = 0; i < lenth; i++) {
    // Write the Unicode character to the file
    fwprintf(fd, L"%lc", output[i]);
}

// Free the allocated memory
free(output);

Now the issue is the multibyte characters are not being converted to the correct HEX value with the code above

Example 1) For this text = "Tên tình bạn dưới tình yêu.mp3"
Expected: 
T  ê  n     t  ì  n  h     b    ạ   n
54 EA 6E 20 74 EC 6E 68 20 62 1EA1 6E ...... and so on

Actual: Wrong!
T   ê   n     t   ì   n  h     b   ạ     n
54 C3AA 6E 20 74 C3AC 6E 68 20 62 E1BAA1 6E ...... and so on

Example 2) For this text = "最佳歌曲在这里.mp3"
Expected: 
最-\u6700 佳-\u4F73 歌-\u6B4C 歌-\u66F2  曲-\u5728 
67 00     4F 73    6B 4C        66 F2     57 28  .....  

Actual: Wrong!
最        佳        歌        歌        曲
E6 9C     80 BD    B3 AD     8C 9B     B2 9C    

So I think it is writing 2 bytes in the case of 'ê' and 'ì' and 3 bytes in the case of 'ạ'. The code is not writing the Hex equivalent of the multibyte character.

What could be the issue? Any help would be appreciated.

=====

I tried another approach not using wchar, checking if a character is a multibyte character and writing all bytes if true

    NSString *fileName = @"Tên tình bạn dưới tình yêu.mp3";
    const char *stringText = [fileName UTF8String];
    unsigned long len = strlen(stringText);
    setlocale(LC_ALL, "");
    for (char character = *stringText; character != '\0'; character = *++stringText)
    {
        if (!character) {
            continue;
        }
        putchar(character);
        int byteCount = numberOfBytesInChar((unsigned char)character);
        if (byteCount <= 1) {
            //putchar(character);
            fprintf(fd, "%c", character);
        } else {
           
            //putchar(character);
            for(int k = 0; k < byteCount; k++)
            {
                fprintf(fd, "%c", character);
                character = *++stringText;
            }
        }
    }

    int numberOfBytesInChar(unsigned char val) {
      if (val < 128) {
         return 1;
      } else if (val < 224) {
         return 2;
      } else if (val < 240) {
         return 3;
      } else {
        return 4;
      }
   }

Even now it is not writing the expected Hex equavalent for multibyte characters.

Example 1) For this text = "Tên tình bạn dưới tình yêu.mp3"
Expected: 
T  ê  n     t  ì  n  h     b    ạ   n
54 EA 6E 20 74 EC 6E 68 20 62 1EA1 6E ...... and so on

Actual: Wrong!
T   ê   n     t   ì   n  h     b   ạ     n
54 C3AA 6E 20 74 C3AC 6E 68 20 62 E1BAA1 6E ...... and so on

Example 2) For this text = "最佳歌曲在这里.mp3"
Expected: 
最-\u6700 佳-\u4F73 歌-\u6B4C 歌-\u66F2  曲-\u5728 
67 00     4F 73    6B 4C        66 F2     57 28  .....  

Actual: Wrong!
最        佳        歌        歌        曲
E6 9C     80 BD    B3 AD     8C 9B     B2 9C     

Any pointers?

1

There are 1 best solutions below

0
On BEST ANSWER

NSString can work with encodings.

Extract the data from the string and write it to disk:

NSData *dataBE = [fileName dataUsingEncoding:NSUTF16BigEndianStringEncoding];
[dataBE writeToFile:@"/Users/user/Desktop/test" options:NSDataWritingAtomic error:&error];

or write the string to disk:

[fileName writeToFile:@"/Users/user/Desktop/test" atomically:YES encoding:NSUTF16BigEndianStringEncoding error:&error];