How can I get/set a TTNTRichEdit RTF content in unicode (utf8/utf16) format?
I use the TRichEdit.loadFromStream/saveToStream methods by TStringStreams to get-set the RTF content. But it use just locale dependent ANSI codes for non standart ASCII characters. (4x : \`f5
)
But I'm going to be in trouble if the user carry him/her project to another computer with a different locale. The national characters will be lost.
The EM_STREAMIN/EM_SREAMOUT messages SF_UNICODE flag can just combined with SF_TEXT not by SF_RTF.
How can I get TTNTRichEdit unicode content in Delphi 7?
680 Views Asked by The Bitman At
2
There are 2 best solutions below
0

Solved (by necessity) using Borland C++ 6. Same code pattern applies for Borland Delphi. (NOTE: TTntRichEdit loads UTF-8 text as UTF-8 ONLY when it explicitly has the BOM header "\357\273\277" or [0xEF, 0xBB, 0xBF])
// This only works with BOM explicit files
// (it will fail on BOM-less UTF-8 files)
TTntRichEdit *myTntRichEdit = ...{some init code}...
myTntRichEdit->Lines->LoadFromFile(UTF8_filename);
So here is my working production code: (Note: TRESource declaration is TTntRichEdit *TRESource;)
void TFormMyExample::LoadJavascriptFromFile(AnsiString myFile) {
// This method will load a UTF-8 text file (with or without BOM)
// // // TRESource->Lines->LoadFromFile(myFile);
TMemoryStream *JSMemoryStream;
TMemoryStream *JSBOM_MemoryStream;
AnsiString BOM = "\357\273\277"; // [0xEF, 0xBB, 0xBF]
try {
JSMemoryStream = new TMemoryStream();
JSMemoryStream->LoadFromFile(myFile);
// check for BOM
char BOMHeader[4];
JSMemoryStream->Seek(0, soFromBeginning);
JSMemoryStream->ReadBuffer(BOMHeader, 3);
JSMemoryStream->Seek(0, soFromBeginning); // reset
BOMHeader[3] = 0;
if (strcmp(BOM.c_str(), BOMHeader) == 0) {
// We have BOM header, so load it.
TRESource->Lines->LoadFromStream(JSMemoryStream);
} else {
// We need the BOM header, so add it.
try {
JSBOM_MemoryStream = new TMemoryStream;
JSBOM_MemoryStream->Write(BOM.c_str(), BOM.Length());
JSBOM_MemoryStream->Seek(0,soFromEnd);
JSBOM_MemoryStream->CopyFrom(JSMemoryStream, 0);
JSBOM_MemoryStream->Seek(0, soFromBeginning);
TRESource->Lines->LoadFromStream(JSBOM_MemoryStream);
}
__finally
{
delete JSBOM_MemoryStream;
}
}
}
__finally
{
delete JSMemoryStream;
}
}
When I write the processed file, it's done in this manner. (Note: TREProcessed declaration is TTntRichEdit *TREProcessed; also: AnsiString outputFileName;)
ofstream SaveFile(outputFileName.c_str());
TREProcessed->PlainText = true;
SaveFile << "\357\273\277"; // Add UTF8 BOM [0xEF, 0xBB, 0xBF]
for (int i = 0, max = TREProcessed->Lines->Count; i < max; i++) {
SaveFile << UTF8Encode(TREProcessed->Lines->Strings[i]).c_str();
if (i < max - 1) {
SaveFile << UTF8Encode(_WS "\n").c_str();
}
}
SaveFile.close();
You have no problem. You are using a Unicode compliant component. You will not suffer data loss. From the Wikipedia article on RTF:
You are observing a code page escape. But that's fine. That's what
\`f5
is. The character is found in the document's code page, and hence a code page escape can be used. If you include characters outside the document's code page then the control will use a Unicode escape.