What kind of UTF-8 is the one which Excel VBA uses for internal text representation?

99 Views Asked by kitap mitap At 28 February 2024 at 09:17

I am using Excel 2007 on Windows 7 in Turkish.

It is said that Excel VBA uses UTF-8 for internal text representation.

There is something strange in case transformation beyond Turkish "İ" and "ı" transformation problem. The UCase and LCase functions do not transform "İ" and "ı" as said in Unicode data.

In Unicode data, it is said that the lowercase of "İ" (0x130) is "i" (0x69) and the uppercase of "ı" (0x131) is "I" (0x49). But VBA's LCase function for "İ" returns "İ" not "i", and the UCase function for "ı" returns "ı" not "I".

These transformations do not fit rules of Unicode. If VBA uses UTF-8, and UTF-8 is a Unicode transformation format, isn't it rationale to expect that transformations fit rules of Unicode at least? (Sorry for "at least")

And more, the InStr, InstrRev, strComp, Replace functions do not work right with the vbTextCompare option if text includes "İ" and "ı", if the cases of these letters are different in the text parameters of functions.

eg. there is no way for InStr to find "İnek" in "inek", and strComp does not return 0 even if it's used with the vbTextCompare option.

And much more, the worksheet function UPPER, LOWER can not be used inside VBA code. These are documented but non-existent.

What is the rule of VBA when transforming letter cases? The answer is also relevant to the vbTextCompare option.

Added section after comments:

First of all, it is a known issue that Turkish "İ" and "ı" case transformations and matching of different cases of these letters can not be done properly.

This issue is related to some other topics: Why does Unicode implement the Turkish I the way it does?

In Turkish:

Lowercase "I" is "ı"
Lowercase "İ" is "i"
Uppercase "ı" is "I"
Uppercase "i" is "İ"

In Turkish Excel
=LOWER("I") is "i"
=LOWER("İ") is "i"
=UPPER("ı") is "I"
=UPPER("i") is "I"

When functions are used in the formulas in the cells, these are the results. Some of this results are not proper for Turkish, but fits Unicode rules (these functions are called with Turkish names on worksheet, of course). UPPER and LOWER functions can not be used as members of Worksheet functions in VBA code.

As Unicode data file,

Lower case of I (0x49) is i (0x69)
Lower case of İ (0x130) is i (0x69)
Upper case of ı (0x131) is I (0x49)
Upper case of i (0x69) is I (0x49)

This situation is weird, of course, but these are the rules.

UCase("i") returns "I" (This action is wrong for Turkish. It is expected, and there are no known solutions as I know. Fits rules of Unicode)
Ucase("ı") returns "ı" (This action is wrong for Turkish, also it does not fit rules of Unicode. This is unexpected)
Lcase("I") returns "i" (This action is wrong for Turkish. It is expected, and there are no known solutions as I know. Fits rules of Unicode)
LCase("İ") returns "İ" (This action is wrong for Turkish, also it does not fit rules of Unicode. This is unexpected)
Instr(1, "İnek", "i", vbTextCompare) returns 0 (Can not find)
Instr(1, "Inek", "i", vbTextCompare) returns 1 (Can be found as-is in English, not in Turkish)

Some replacements can not be done:
Replace("Inek", "i", "Whatever", 1, -1, vbTextCompare) returns "Whatevernek" (This is wrong for Turkish, but expected. Also fits the rules of Unicode)
Replace("Inek", "ı", "Whatever", 1, -1, vbTextCompare) returns "Inek", not "Whatevernek" (This is wrong for Turkish, but expected. Also does not fit rules of Unicode, which is unexpected)
Replace("İnek", "i", "Whatever", 1, -1, vbTextCompare) returns "İnek", not "Whatevernek" (This is wrong for Turkish, but expected. Also does not fit rules of Unicode, which is unexpected)
Replace("İnek", "ı", "Whatever", 1, -1, vbTextCompare) returns "İnek", not "Whatevernek" (This is wrong for Turkish, but expected. There is no rule for this situation in Unicode)

and comparing can not be done:
StrComp("İnek", "inek", vbTextCompare) returns 1, not 0 (This is wrong for Turkish, but expected. Also does not fit rules of Unicode, which is unexpected)
StrComp("Inek", "ınek", vbTextCompare) returns -1, not 0 (This is wrong for Turkish, but expected. Also does not fit rules of Unicode, which is unexpected)

The case conversion issue of Turkish "İ" and "ı" letters is known, and no simple solution exists as I know in VBA code.

I ask specifically which rule is used for case conversion and for matching, and why Unicode rules are not used.

StrConv function give results as UCase and LCase functions give:
StrConv("I", vbLowerCase) returns "i"
StrConv("İ", vbLowerCase) returns "İ"
StrConv("ı", vbUpperCase) returns "ı"
StrConv("i", vbUpperCase) returns "I"

The CompareString API function called with the NORM_IGNORECASE argument gives the same results as the StrComp function.

Original Q&A

What kind of UTF-8 is the one which Excel VBA uses for internal text representation?

There are 0 best solutions below

Related Questions in EXCEL

Related Questions in VBA

Related Questions in UNICODE

Related Questions in CODEPAGES

Trending Questions

Popular # Hahtags

Popular Questions