How can I get TTNTRichEdit unicode content in Delphi 7?

Question

How can I get TTNTRichEdit unicode content in Delphi 7?

711 Views Asked by The Bitman At 11 August 2015 at 14:30

How can I get/set a TTNTRichEdit RTF content in unicode (utf8/utf16) format? I use the TRichEdit.loadFromStream/saveToStream methods by TStringStreams to get-set the RTF content. But it use just locale dependent ANSI codes for non standart ASCII characters. (4x : \`f5 ) But I'm going to be in trouble if the user carry him/her project to another computer with a different locale. The national characters will be lost. The EM_STREAMIN/EM_SREAMOUT messages SF_UNICODE flag can just combined with SF_TEXT not by SF_RTF.

Original Q&A

There are 2 best solutions below

JKMickelson On 25 June 2021 at 16:40

Solved (by necessity) using Borland C++ 6. Same code pattern applies for Borland Delphi. (NOTE: TTntRichEdit loads UTF-8 text as UTF-8 ONLY when it explicitly has the BOM header "\357\273\277" or [0xEF, 0xBB, 0xBF])

// This only works with BOM explicit files
// (it will fail on BOM-less UTF-8 files)
TTntRichEdit *myTntRichEdit = ...{some init code}...
myTntRichEdit->Lines->LoadFromFile(UTF8_filename);

So here is my working production code: (Note: TRESource declaration is TTntRichEdit *TRESource;)

void TFormMyExample::LoadJavascriptFromFile(AnsiString myFile) {
    // This method will load a UTF-8 text file (with or without BOM)

    // // // TRESource->Lines->LoadFromFile(myFile);

    TMemoryStream *JSMemoryStream;
    TMemoryStream *JSBOM_MemoryStream;
    AnsiString BOM = "\357\273\277"; // [0xEF, 0xBB, 0xBF]

    try {
        JSMemoryStream = new TMemoryStream();
        JSMemoryStream->LoadFromFile(myFile);

        // check for BOM
        char BOMHeader[4];
        JSMemoryStream->Seek(0, soFromBeginning);
        JSMemoryStream->ReadBuffer(BOMHeader, 3);
        JSMemoryStream->Seek(0, soFromBeginning); // reset
        BOMHeader[3] = 0;

        if (strcmp(BOM.c_str(), BOMHeader) == 0) {
            // We have BOM header, so load it.
            TRESource->Lines->LoadFromStream(JSMemoryStream);
        } else {
            // We need the BOM header, so add it.
            try {
                JSBOM_MemoryStream = new TMemoryStream;
                JSBOM_MemoryStream->Write(BOM.c_str(), BOM.Length());

                JSBOM_MemoryStream->Seek(0,soFromEnd);
                JSBOM_MemoryStream->CopyFrom(JSMemoryStream, 0);
                
                JSBOM_MemoryStream->Seek(0, soFromBeginning);
                TRESource->Lines->LoadFromStream(JSBOM_MemoryStream);
            }
            __finally
            {
                delete JSBOM_MemoryStream;
            }
        }

    }
    __finally
    {
        delete JSMemoryStream;
    }

}

When I write the processed file, it's done in this manner. (Note: TREProcessed declaration is TTntRichEdit *TREProcessed; also: AnsiString outputFileName;)

    ofstream SaveFile(outputFileName.c_str());
    TREProcessed->PlainText = true;
    SaveFile << "\357\273\277"; // Add UTF8 BOM [0xEF, 0xBB, 0xBF]

    for (int i = 0, max = TREProcessed->Lines->Count; i < max; i++) {
        SaveFile << UTF8Encode(TREProcessed->Lines->Strings[i]).c_str();
        if (i < max - 1) {
            SaveFile << UTF8Encode(_WS "\n").c_str();
        }
    }
    SaveFile.close();

**David Heffernan** · Accepted Answer · 2015-08-11T22:19:59.443000

You have no problem. You are using a Unicode compliant component. You will not suffer data loss. From the Wikipedia article on RTF:

A standard RTF file can consist of only 7-bit ASCII characters, but can encode characters beyond ASCII by escape sequences. The character escapes are of two types: code page escapes and, starting with RTF 1.5, Unicode escapes. In a code page escape, two hexadecimal digits following a backslash and typewriter apostrophe are used for denoting a character taken from a Windows code page. For example, if the code page is set to Windows-1256, the sequence \'c8 will encode the Arabic letter bāʼ (ب).

For a Unicode escape the control word \u is used, followed by a 16-bit signed decimal integer giving the Unicode UTF-16 code unit number. For the benefit of programs without Unicode support, this must be followed by the nearest representation of this character in the specified code page. For example, \u1576? would give the Arabic letter bāʼ ب, specifying that older programs which do not have Unicode support should render it as a question mark instead.

You are observing a code page escape. But that's fine. That's what \`f5 is. The character is found in the document's code page, and hence a code page escape can be used. If you include characters outside the document's code page then the control will use a Unicode escape.

How can I get TTNTRichEdit unicode content in Delphi 7?

There are 2 best solutions below

Related Questions in DELPHI

Related Questions in UNICODE

Related Questions in DELPHI-7

Related Questions in RTF

Related Questions in TRICHEDIT

Trending Questions

Popular # Hahtags

Popular Questions