How do I link and embed a UTF8 encoded text file in an MS-Word document?

1k Views Asked by At

I would like to include the contents of a UTF8 text file in a MS Word document as a link. This works for an ansi encoded file using the field:

{INCLUDETEXT "path\file.txt" \c ansitext \* MERGEFORMAT}

Is there a directive akin to \c ansitext for UTF8 files? \c utf8 and \c utf8text do not appear to work.

If I do not give any directive, Word recognizes that the file is UTF8, but a dialog pops up requiring me to confirm this each time the file needs updating, which I want to avoid.

1

There are 1 best solutions below

0
On BEST ANSWER

There is a directive ( \c Unicode ) but unfortunately using it does not actually eliminate the character encoding pop-up, even when the Unicode text starts with a BOM (Byte Order Mark), which are in any case discouraged by Unicode.

So although that answers the question actually asked, it doesn't solve the problem. Nor, according to the discussion in comments to the Question, would any of the following solve the problem for the OP, but they might help others.

According to the ISO 29500 standard that describes .docx documents, INCLUDETEXT is supposed to have an \e switch that lets you specify an encoding. But, according to Microsoft's standard document [MS-OI29500].pdf, Word ignores any \e switch.

As far as I am aware the only way to avoid that pop-up when the included text is in Unicode format (UTF-8) is to set a value in the Windows Registry that tells Word the default encoding for text files.

The problem with that is that that setting will affect what happens to all the text files opened by Word, whether through the file open dialog or an INCLUDETEXT.

To create the setting, you need to navigate to the following Registry location, e.g. for Word 2016/2019 it would be

HKEY_CURRENT_USER\Software\Microsoft\Office\16.0\Word\Options

and for Word 2010 it would be

HKEY_CURRENT_USER\Software\Microsoft\Office\14.0\Word\Options

Then add a DWORD value called DefaultCPG and set its value to the code page you want to be the default. For UTF-8, that's decimal 65001.

If you have control over the format of the file to be included, you could consider using a format that wouldn't trigger the encoding pop-up. That leads to another set of problems, e.g. if you used HTML you would probably have to deal with HTML special characters such as & etc., whitespace, and RTL characters (which Word seems to reverse). But the following HTML "framework" is enough to insert a text chunk without additional paragraph marks and so on:

<html>
  <meta charset="UTF-8">
  <body>
    <a name="x">your text</a>
  </body>
</html>

In the INCLUDETEXT field, you then use the "x" to indicate the subset you want to include, e.g.

{INCLUDETEXT  "path\file.htm" x \c HTML}

The HTML coding <a name="something"> is deprecated in HTML 5, but Word only understands the earlier HTML convention.