I am making a program in Delphi 7, that is supposed to encode a unicode string into html entity string.
For example, "ABCģķī" would result in "ABCģķī"
Now 2 basic things:
- Delphi 7 is non-Unicode, so I can't just write unicode chars directly in code to encode them.
- Codepages consist of 255 entries, each holding a character, specific to that codepage, except first 127, that are same for all the codepages.
So - How do I get a value of a char, that is in 1-255 range?
I tried Ord(Integer), but it also returns values way past 255. Basically, everything is fine (A returns 65 an so on) until my string reaches non-Latin unicode.
Is there any other method for returning char value? Any help appreciated
I suggest you avoid codepages like the plague.
There are two approaches for Unicode that I'd consider: WideString, and UTF-8.
Widestrings have the advantage that it's 'native' to Windows, which helps if you need to use Windows API calls. Disadvantages are storage space, and that they (like UTF-8) can require multiple WideChars to encode the full Unicode space.
UTF-8 is generally preferable. Like WideStrings, this is a multi-byte encoding, so a particular unicode 'code point' may need several bytes in the string to encode it. This is only an issue if you're doing lots of character-by-character processing on your strings.
@DavidHeffernan comments (correctly) that WideStrings may be more compact in certain cases. However, I'd only recommend UTF-16 only if you are absolutely sure that your encoded text will really be more compact (don't forget markup!), and this compactness is highly important to you.