Error converting .docx with HTML to PDF using Graph API

445 Views Asked by At

I am trying to convert MS Word (.docx) file to PDF format using Graph API. The file is stored in SharePoint Office 365. I am using below code which works.

var httpClient = await CreateAuthorizedHttpClient();
string path = $"{GraphEndpoint}sites/{SiteId}/drive/items/";
string requestUrl = $"{path}{fileId}/content?format={targetFormat}";
var response = await httpClient.GetAsync(requestUrl);

However, when we try to convert .docx file which contains HTML added using below code converting fails.

string altChunkId = "myId123";
//Create an alternative format import part on the MainDocumentPart
AlternativeFormatImportPart altformatImportPart = wordDoc.MainDocumentPart
    .AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Html, altChunkId);
using (MemoryStream htmlMemoryStream = new MemoryStream(Encoding.UTF8.GetBytes($"<html><head></head><body>{value}</body></html>")))
{
    //Add the HTML data into the alternative format import part
    altformatImportPart.FeedData(htmlMemoryStream);
    //create a new altChunk and link it to the id of the AlternativeFormatImportPart
    AltChunk altChunk = new AltChunk();
    altChunk.Id = altChunkId;
    //p.InsertAfterSelf(altChunk);
    documentBody.Append(altChunk);
    break;
}

I get 406 Not Acceptable error when we try to convert the file using Graph API. Also I see that the file is not editable in browser and open in compatibility mode. If I try to open the document in edit mode I get error:

Sorry this document can't be opened because it contains objects that word doesn't support

I tried removing the HTML part of the document and pasted that in another document and tried converting that document to PDF which worked. When I saw the XML of the document I saw Word App converted that HTML to word compatible XML tags.

Question 1: How can I convert the HTML to word compatible tags? So I can convert the document to PDF.

Also if I try to Download as PDF, the file is converted to PDF without any issue.

Download as PDF

This option is using below API call:

https://word-view.officeapps.live.com/wv/WordViewer/request.pdf?WOPIsrc={SiteURL}%2F%5Fvti%5Fbin%2Fwopi%2Eashx%2Ffiles%2F{ID}&access_token=&access_token_ttl=&z=256&type=downloadpdf

Question 2: Is there a way I can use this API to convert .docx file to PDF? I saw the access token's audience value is "wopi/{TenantName}@{TenantID}". If I get the correct access token I think I will be able to use the above API.

0

There are 0 best solutions below