I'm looking for a way to convert wchar_t to multi-bytes char, without using wctomb or any ready-made routine. I have to do that in C, not C++, and the interoperability doesn't matter here.
My goal is to print wchar byte by byte using the write syscall. For example, the 'é' character is equivalent to 0xe9 encoded into a wchar, and is equivalent to ff ff ff c3 ff ff ff a9 in its multi-bytes form. Ho can I switch from one form to the other?
Thanks in advance.
This is the same as conversion between any two encodings. First determine the encoding used to encode characters in source and destination, then translate characters from one encoding to another.
So first
wchar_t- it's encoding is (or should be) constant and determined by your compiler and environment. So read about your environment and about your compiler. You specifiedDebian, using gccthen read gcc documentation and nowadays on linuxwchar_tis meant to represent oneUCS-4"character". Note that on windowswchar_tisUTF-16.Then determine the destination encoding, the encoding of the multi-byte string - it depends on locale. Read and parse
LC_CTYPElocale, you might want read posix locale and about locale naming. Then because ofwithout using any ready-made routinein the sad case when the locale doesn't specifycodeset, you have to write your own platform-specific parser forlocalespecific files and infer the default character encoding for specific current locale (I am not really sure how it happens here, you have to find "the locale language category"). Pages like man 7 locale man 7 charsets look like a good read.Then after determining the destination and source encodings, you need to write a routine that will translate one encoding to another. Because of
without using any ready-made routineyou don't want to use iconv, that means you have to write it yourself. That goes to reading specification of both encodings and what characters are represents by what codepoints in these encodings and then deciding how to translate each and every codepoint from one encoding to another.All in all, another projects source code, like glibc source code or libiconv or libunistring might be sources of inspiration.
Most probably the multibyte encoding is UTF-8, unicode is dominating todays world. As such, you'll want to research how to convert a UTF-32 to UTF-8, which is actually a simple routine.