I'm trying to write a parser for "text" files which I know will be encoded in one of the Windows single byte code pages. These files contain text representations of basic data types, and the spec I have for these representations is lacking, to say the least.
I noticed in Windows-874 ten little inconspicuous characters near the end called THAI DIGIT ZERO
to THAI DIGIT NINE
.
I'm trying to write this parser to be pretty robust but I'm working a bit in the dark as there are many different programs which can generate these data files and I don't have access to the sources.
What I want to know is: do any functions in Microsoft C++ libraries convert real number data types into a std::string
or char const *
(i.e. serialization) which would contain non-arabic-numerals?
I don't use Microsoft C++ libraries so can't reference any in particular but a made-up example could be char const * IntegerFunctions::ToString(int i)
.
These digits certainly could be created by Microsoft libraries. The properties
LOCALE_IDIGITSUBSTITUTION
andLOCALE_SNATIVEDIGITS
determine whether numbers formatted by the OS will use native (i.e. non-ASCII) digits. Those are initially Unicode, because that's what how Windows internally creates strings. When you have a Thai locale, and you convert Unicode to CP874, then those characters will be kept.A simple function that demonstrates this behavior is
GetNumberFormatA