How to read 'Extended' MS Word file tags without Office.Interop?

2.1k Views Asked by At

I have .docx file with custom properties specified only for MS Office files. File properties

If I tried to open same file in computer without installed MS office then there is no Tags property in file details tab.

I need to read Tags in my c# code.

I tried this solution and retrieve Tags index as 18. Then I used next code:

public class TagsReader : ITagsReader
{
    private const int keywordsIndex = 18;

    public string Read(string filePath)
    {
        var fullPath = Path.GetFullPath(filePath);

        var directoryName = Path.GetDirectoryName(fullPath);
        Folder dir = GetShell32Folder(directoryName);
        var fileName = Path.GetFileName(fullPath);

        FolderItem item = dir.ParseName(fileName);
        return dir.GetDetailsOf(item, keywordsIndex);
    }

    private Folder GetShell32Folder(string folderPath)
    {
        var shellAppType = Type.GetTypeFromProgID("Shell.Application");
        var shell = Activator.CreateInstance(shellAppType);
        return (Folder)shellAppType.InvokeMember("NameSpace",
        BindingFlags.InvokeMethod, null, shell, new object[] { folderPath });
    }
}

But it does not work for computers without installed MS Office. It works only for .doc files but not for .docx. Now I used Interop based solution which is not stable, resource-intensive and requires to install MS Office to the server:

public class WordTagsReader : ITagsReader
{
    private readonly string[] availableFileExtensions = { ".docx" };
    public string Read(string filePath)
    {
        var fileExtension = Path.GetExtension(filePath);
        if (!availableFileExtensions.Contains(fileExtension))
            return null;

        dynamic application = null;
        dynamic document = null;
        var tags = string.Empty;
        try
        {
            var typeWord = Type.GetTypeFromProgID("Word.Application");
            application = Activator.CreateInstance(typeWord);
            application.Visible = false;
            application.DisplayAlerts = false;
            var fullFilePath = Path.GetFullPath(filePath);
            document = application.Documents.Open(fullFilePath);
            tags = document.BuiltInDocumentProperties["Keywords"].Value;
        }
        finally
        {
            if (document != null)
            {
                document.Close();
                document = null;
            }
            if (application != null)
            {
                application.Quit();
                application = null;
            }
        }

        return tags;
    }
}

This code can crashes from time to time and left running instances of MS Word which takes resources and blocks file. I've many handlers worked in the same time and then I can't separate "left" instances from properly worked and clean resources.

This is the reason to search alternate solution. Is there a way to read specific (custom) properties like Tags without using Office.Interop?

3

There are 3 best solutions below

1
On BEST ANSWER

U can use warm lamp .docx format reading. Something like this:

using System.IO.Packaging;

var package = Package.Open(ms, FileMode.Open, FileAccess.ReadWrite);
var corePart = package.GetPart(new Uri("/docProps/core.xml", UriKind.Relative))
XDocument settings;
using (TextReader tr = new StreamReader(settingsPart.GetStream()))
    settings = XDocument.Load(tr);

XNamespace cp = "http://schemas.openxmlformats.org/package/2006/metadata/core-properties"
var tags = settings.Root.Element(cp + "keywords");

No need to use additional libraries or sdk's. Only System.IO, only hardcore!

5
On

I suggest using the Open Xml Sdk for this, open xml is the 'new' standard for office. Reading the tags would be possible with this code: (note you need to use the DocumentFormat.OpenXml.Packaging namespace for this)

string tags = "";
using(var doc = WordProcessingDocument.Open("filename",false)
    tags = doc.PackageProperties.KeyWords;

Using open xml doesn't need anything office related installed on the machine so it's perfect for using it on servers or in your example for reading/editing documents on machines that don't have office installed.

0
On

Microsoft does not currently recommend, and does not support, Automation of Microsoft Office applications from any unattended, non-interactive client application or component (including ASP, ASP.NET, DCOM, and NT Services), because Office may exhibit unstable behavior and/or deadlock when Office is run in this environment.

If you are building a solution that runs in a server-side context, you should try to use components that have been made safe for unattended execution. Or, you should try to find alternatives that allow at least part of the code to run client-side. If you use an Office application from a server-side solution, the application will lack many of the necessary capabilities to run successfully. Additionally, you will be taking risks with the stability of your overall solution. Read more about that in the Considerations for server-side Automation of Office article.

As a workaround you may condider using the Open XML SDK, see Welcome to the Open XML SDK 2.5 for Office for more information. Or use any third-party components designed for the server side execution. For example, take a look at Aspose.