Aspose.Word convert DOCX to HTML looses MERGEFIELD, IF conditions, headers and footer, table cell widths

1.2k Views Asked by At

I'm trying to write a online document editor with TinyMCE 5 as editor and Aspose.Word v20.8 as converter.

But when I convert the DOCX to HTML5 with Aspose.Word, it is not rendering as expected in TinyMCE. The HTML looses for example headers, footers, MergeFields, IF, TableStart:TableEnd sofar I can tell now. I need this HTML has all the data because I need to convert it back to DOCX again.

Code to generate the HTML5 is:

var doc = new Document({Stream_Of_DOCX});
var options = new HtmlSaveOptions();
options.SaveFormat = SaveFormat.Html;
options.Encoding = System.Text.Encoding.UTF8;
options.UpdateFields = true;
options.ExportRoundtripInformation = true;
options.ExportImagesAsBase64 = true;
options.ExportFontsAsBase64 = true;
options.ExportPageSetup = true;
options.ExportDocumentProperties = true;
options.ExportHeadersFootersMode = ExportHeadersFootersMode.PerSection;
options.HtmlVersion = HtmlVersion.Html5;

doc.Save($"{fileName}.html", options);

The code to convert the HTML5 back to DOCX is, were the model.Html is the TinyMCE textarea:

var doc = new Document();
var builder = new DocumentBuilder(doc);
builder.InsertHtml(model.Html);
doc.Save($"{fileName}.docx");

Can anybody help me to get this working with some code examples? Or maybe has a better idear to accomplish the task. The main idear is to be able to edit DOCX files online, without to have to download it and upload again with some windows service as client for example.

1

There are 1 best solutions below

4
On

Aspose.Words do preserve headers and footers upon saving to HTML if ExportRoundtripInformation option is enabled. In this case Aspose.Words writes header and footer content with special css attributes, which are understood by Aspose.Words:

<div style="-aw-headerfooter-type:header-primary; clear:both">
    <p style="margin-top:0pt; margin-bottom:0pt; line-height:normal">
        <span>header</span>
    </p>
</div>

Also, Aspose.Words preserves some fields (PAGE, NUMPAGES, NOTEREF, REF, AUTOR and TITLE). For example, PAGE field is exported like the following:

<span style="-aw-field-start:true"></span><span style="-aw-field-code:' PAGE   \\* MERGEFORMAT '"></span><span style="-aw-field-separator:true"></span><span>1</span><span style="-aw-field-end:true"></span>

Such content is recognized by Aspose.Words upon reading HTML and loaded into the model as field. I logged a request WORDSNET-21037 to preserve other types of fields too.

I am not familiar with TinyMCE, but I suspect that custom attributes used by Aspose.Words for roundtrip MS Word features are removed and that is why Header and Footer are not preserved in your case.

Disclosure: I work at Aspose.Words team.