Portable UTF-8 Interface (Windows and Unix) without wide API

122 Views Asked by At

I am setting a path to a file on the harddrive using the following interface:

void setPath(const char* path);

This path will be used for basic file I/O.

If I for example provide a path containing Chinese characters ( e.g. via QString::toUtf8()), this works fine for Unix, but of course doesn't for Windows because of the internal use of the wchar/wstring API.

I am now searching for an elegant way to make this interface UTF-8 compatible on both Windows and Unix based systems. Is there a way to avoid the wide API on Windows based systems and keep using std::string and std::ofstream() ?

After looking at boost::locale this appears to me as a possibility to handle UTF-8 encoding. Would this be a way to go ( replacing the std::ofstream by its boost::ofstream() counterparts for example ? )

const std::locale loc = generator.generate(std::locale(), "zh_CN.UTF-8");
std::locale::global(loc);
std::cout.imbue(std::locale());
boost::filesystem::path::imbue(std::locale())

All help is appreciated.

1

There are 1 best solutions below

0
On

Is there a way to avoid the wide API on Windows based systems

The Windows API does not support UTF-8, except in a few select APIs. Largely it only supports locale-dependent ANSI and UTF-16. To support Unicode without losing data, you have to use the UTF-16 based APIs.

Your interface will need to internally convert UTF-8 strings to UTF-16 when passing them to Windows API functions, and convert from UTF-16 to UTF-8 when receiving data from the API. There is no other way. This belongs in your underlying platform-specific logic, not in the higher-layer public interface.

and keep using std::string and std::ofstream() ?

You can use std::string for UTF-8, and there are plenty of ways to convert between std::string UTF-8 and std::wstring UTF-16 (there are even classes in C++11 to handle that).

Microsoft has non-standard extensions to std::ifstream and std::ofstream in Visual Studio to accept UTF-16 filenames. Other vendors may or may not provide similar functionality.