Here the plain string
has a kind of encoding which:
A plain string-literal such as
"plainstring"
encoded as;All standard libraries return or accept. For example:
std::cout << "I'm ok." ; // plain string, ok on my system,
// VS2015 x64 default encoding setting.
std::cout << u8"I'm wrong."; // got error display on my system
std::experimental::filesystem::path path("Some Right specified Path contains non-ASCII chars"); // ok
std::experimental::filesystem::path path2(u8"Some Path specified Path contains non-ASCII chars"); // error
std::experimental::filesystem::directory_iterator r(path); // ok
std::experimental::filesystem::directory_iterator r2(path2); // will throw exception
As I know, my sysytem (windows 10 x64) use GB2312
encoding for such plain string.
But how to convert them into(and convert back) other encoding such as utf-8
in a platform-independent way??
This is a simple-sounding question, but it is actually an extremely complex issue.
The short answer: A round trip from GB2312 to UTF-8 then back to GB2312 is possible, but you can't do a round-trip conversion from UTF-8 to GB2312 then back to UTF-8.
The longer answer: Any string that can be represented in a standards-compliant way can be expressed in Unicode, and any string that can be expressed in Unicode can be encoded in UTF-8.
The converse is not true. It is not possible to convert an arbitrary Unicode string into any other (standard) encoding.
Unicode contains 1,114,112 code points. It takes at least three bytes to represent this many different points. UTF-8 can represent any of these code points.
GB2312 (AKA Simplified Chinese) contains 6000 + code points, so there are many Unicode code points that have no corresponding entry in GB2312. That is why a UTF-8 to GB3213 encoding will always be lossy. So theoretically a round-trip conversion is not possible.
That being said, there are "best-effort" converters from UTF-8 to GB2312, and there is no reason why they shouldn't be platform independent. A google search of
UTF-8 to GB2312 conversion
finds many possibilities, most of which do not depend on any particular platform.I suggest that you do this search and pick the result that meets your needs.
One platform-independent solution to converting between encodings is boost.locale A complete explanation of what it can do for you is beyond what would fit in a Stack Overflow answer <humor>even if I use the margins.</humor>.
For additional reading: this page provides useful background information for understanding string encoding issues.